Section 7. Population economics
Wang Yuzhe, Cushing Academy E-mail: [email protected]
CRIME RATE PREDICTION USING HOUSE PROPERTIES VIA ARTIFICIAL NEURAL NETWORK VERSUS LINEAR REGRESSION MODELS
Abstract:
Objective: This study aimed to build a predictive model for crime rate based on 13 house features using artificial neural network versus linear regression models.
Methods: Boston housing data was used for this study and it is publicly available at https://archive. ics.uci.edu/ml/datasets/Housing. Per capita crime rate by town was the outcome of interest and three other features were used as predictors, namely, 1) proportion of residential land zoned for lots over 25.000 sq.ft., 2): proportion of non-retail business acres per town, 3) Charles River dummy variable (= 1 if tract bounds river; 0 otherwise), 4) nitric oxides concentration (parts per 10 million), 5) average number of rooms per dwelling, 6) proportion of owner-occupied units built prior to 1940, 7) weighted distances to five Boston employment centers, 8) index of accessibility to radial highways, 9) full-value property-tax rate per $10.000.10): pupil-teacher ratio by town, 11) 1000(Bk - 0.63)A2 where Bk is the proportion ofblacks by town, 12):% lower status of the population, 13) Median value of owner-occupied homes in $1000's. All the records were randomly assigned into 2 groups: training sample (75%) and testing sample (25%). Two models were built using training sample: artificial neural network and linear regression. For artificial neural network, the input layer has 13 inputs, the two hidden layers have 5 and 3 neurons and the output layer has a single output. Mean squared errors (MSE) were calculated and compared between both models. A cross validation was conducted using a loop for the neural network and the cv. glm function in the boot package for the linear model. A package called "neuralnet" in R was used to conduct neural network analysis.
Results: For testing sample, the MSE was 93.5 for the linear regression and 64.7 for the artificial neural network. Artificial neural network performed better clearly. In cross validation, the average MSE for the neural network (37.0) is lower than the one of the linear model (43.12) although there seems to be a certain degree of variation in the MSEs of the cross validation. This may depend on the splitting of the data or the random initialization of the weights in the net.
Conclusions: In this study, we built a predictive model for crime rate using neural network and compared its performance with a more population approach-linear regression. This study suggests
that it is possible to develop a reproducible and transportable predictive instrument for crime rate using common available housing features.
Keywords: crime rate, prediction model, linear regression and neural network.
1. Introduction
Based on a review of the extant literature and discussions with various officials at all jurisdiction levels across the country, it is highly doubtful that any serious, systematic forecasting of crime rates is done anywhere. It is safe to say that the current approach to forecasting crime, insofar as it exists, is extremely crude, for example, mapping crimes by police precinct or beat, and then assigning more resources to the areas with the most hits in the past.
Public safety is the most important metric for elected officials, especially at the local level, and allocating scarce crime fighting resources efficiently is an essential element of achieving this goal [1]. There are several potential reasons for this failure that come to mind. The first reason is that existing tools may simply be insufficient to provide meaningful forecasts. Technical forecasting, using economics models, computer technology, and mapping tools, is a modern phenomenon [2] and these methods are unproven, occasionally difficult to interpret, and occasionally expensive to set up and operate, especially for communities with tight budgets.
An artificial neural network (ANN), often just called a "neural network" (NN), is a mathematical model or computational model based on biological neural networks, in other words, is an emulation of biological neural system. This model has been used in other medical areas but not used to predict retinopathy among diabetes patients to our best knowledge. We are unaware of any studies in the literature that have integrated housing features commonly available using Artificial Neural Network. We compared the
Variables:
1. CRIM per capita crime rate by town
2. ZN Proportion of residential land zoned for lots over 25.000 sq.ft.
3. INDUS proportion of non-retail business acres per town
4. CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
performance ofArtificial Neural Network with linear regression in terms of predictive ability.
2. Data and methods
Boston housing data was used for this study and it is publicly available at https://archive.ics.uci.edu/ml/ datasets/Housing. Per capita crime rate by town was the outcome of interest and three other features were used as predictors, namely, 1) proportion of residential land zoned for lots over 25.000 sq.ft., 2) proportion of non-retail business acres per town, 3) Charles River dummy variable (= 1 if tract bounds river; 0 otherwise), 4) nitric oxides concentration (parts per 10 million), 5) average number of rooms per dwelling, 6) proportion of owner-occupied units built prior to 1940, 7) weighted distances to five Boston employment centers, 8) index of accessibility to radial highways, 9) full-value property-tax rate per $10.000, 10) pupil-teacher ratio by town, 11) 1000(Bk -0.63)A2 where Bk is the proportion of blacks by town, 12) % lower status of the population, 13) Median value ofowner-occupied homes in $1000's. All the records were randomly assigned into 2 groups: training sample (75%) and testing sample (25%). Two models were built using training sample: artificial neural network and linear regression. For artificial neural network, the input layer has 13 inputs, the two hidden layers have 5 and 3 neurons and the output layer has a single output. Mean squared errors (MSE) were calculated and compared between both models. A cross validation was conducted using a loop for the neural network and the cv. glm function in the boot package for the linear model. A package called "neuralnet" in R was used to conduct neural network analysis.
5. NOX
6. RM
7. AGE
8. DIS
9. RAD
10. TAX
11. PTRATIO
12. B
13. LSTAT
14. MEDV
nitric oxides concentration (parts per 10 million)
average number of rooms per dwelling
proportion of owner-occupied units built prior to 1940
weighted distances to five Boston employment centers
index of accessibility to radial highways
full-value property-tax rate per $10.000
pupil-teacher ratio by town
1000(Bk - 0.63)A2 where Bk is the proportion of blacks by town
% lower status of the population
Median value of owner-occupied homes in $1000's
3. Results:
The per capita crime rate by town was 3.21 in the training group, and 4.84 in testing group; overall it was 3.61. Table 2.- Crime Rate And House Properties In Training and Testing Groups
Training Group (N=380) Testing Group (N=126) Overall Group (N=506)
Variable Mean Std Dev Min Max Mean Std Dev Min Max Mean Std Dev Min Max
CRIM 3.21 7.44 0.01 73.53 4.84 11.36 0.01 88.98 3.61 8.6 0.01 88.98
ZN 12.52 24.49 0 100 7.87 19.05 0 95 11.36 23.32 0 100
INDUS 10.84 6.89 0.46 27.74 12.02 6.72 0.74 27.74 11.14 6.86 0.46 27.74
CHAS 7% 0 1 6% 0 1 7% 0 1
NOX 0.55 0.12 0.39 0.87 0.56 0.11 0.4 0.87 0.55 0.12 0.39 0.87
RM 6.28 0.72 3.56 8.78 6.31 0.66 4.93 8.4 6.28 0.7 3.56 8.78
AGE 68.47 27.65 6.2 100 68.9 29.73 2.9 100 68.57 28.15 2.9 100
DIS 3.87 2.17 1.14 12.13 3.56 1.9 1.13 9.19 3.8 2.11 1.13 12.13
RAD 9.19 8.51 1 24 10.63 9.24 1 24 9.55 8.71 1 24
TAX 401.11 164.98 187 711 429.72 177.79 188 711 408.24 168.54 187 711
PTRATIO 18.47 2.16 12.6 22 18.43 2.19 13 21.2 18.46 2.16 12.6 22
B 363.47 81.93 2.52 396.9 336.17 112.95 0.32 396.9 356.67 91.29 0.32 396.9
LSTAT 12.55 7.23 1.73 37.97 12.97 6.9 2.96 30.81 12.65 7.14 1.73 37.97
MEDV 22.54 8.98 5 50 22.51 9.86 5 50 22.53 9.2 5 50
Proportion of residential land zoned for lots to radial highways, 1000(Bk - 0.63)a2 where Bk
over 25.000 sq.ft., nitric oxides concentration is the proportion of blacks by town were signifi-
(parts per 10 million), weighted distances to five cant predictors for crime rate per capita by town
Boston employment centers, index of accessibility (p < 0.05).
Table 3.- Linear Regression Model To Predict Crime Rate Per Capita By Town
Estimate Std. Error T value Pr(> t )
1 2 3 4 5 6
ZN 0.04 0.02 2.35 0.019 *
INDUS -0.08 0.08 -1.10 0.273
CHAS -0.49 1.04 -0.47 0.641
1 2 3 4 5 6
NOX -10.25 4.79 -2.14 0.033 *
RM -0.56 0.53 -1.05 0.292
AGE 0.01 0.02 0.69 0.492
DIS -0.77 0.25 -3.08 0.002 **
RAD 0.51 0.08 6.24 0.000 ***
TAX 0.00 0.00 -0.48 0.629
PTRATIO -0.12 0.17 -0.72 0.474
B -0.02 0.00 -5.96 0.000 ***
LSTAT 0.12 0.07 1.78 0.077
MEDV -0.10 0.05 -1.79 0.074
***: <0.001; **, <0.01; <0.05; <0.10
1 ( 1 ) ( 1 ) ( 1
Figure 1. Artificial Neural Network
The black lines show the connections between each layer and the weights on each connection while the blue lines show the bias term added in each step. The bias can be thought as the intercept of a linear
For Crime Rate Per Capita By Town
model. The net is essentially a black box so we cannot say that much about the fitting, the weights and the model. Suffice to say that the training algorithm has converged and therefore the model is ready to be used.
test$ CRIM testSCRIM
Figure 2. Real vs Predicted Crime Rate In Artificial Neural Network And Linear Regression Model
Figure 3. MSE for Artificial Neural Network for Testing Group
By visually inspecting the plot we can see that the predictions made by the neural network are (in general) more concentrated around the line (a perfect alignment with the line would indicate a MSE of 0 and thus an ideal perfect prediction) than those made by the linear model.
Cross validation is another very important step of building predictive models. In cross validation, the average MSE for the neural network (40.6) is lower than the one of the linear model (43.18) although there seems to be a certain degree ofvariation in the MSEs of the cross validation. This may depend on the splitting of the data or the random initialization of the weights in the net.
4. Discussion
Crime predictions can be developed through both qualitative and quantitative methods. Qualitative approaches to forecasting crime [3], such as environmental scanning, scenario writing, or Delphi groups, are particularly useful in identifying the future nature of criminal activity. In contrast, quantitative methods are used to predict the future scope of crime, and more specifically, crime rates. A common quantitative method for developing forecasts is to extrapolate annual crime rate trends developed through time series models. This approach also involves correlating past crime trends with factors that will influence the future scope of crime, in particular demographic and macro-economic variables.
In this study, we built a predictive model for crime rate using neural network and compared its
performance with a more population approach— -linear regression. This study suggests that it is possible to develop a reproducible and transportable predictive instrument for crime rate using common available housing features.
According to the linear regression, proportion of residential land zoned for lots over 25.000 sq.ft., nitric oxides concentration (parts per 10 million), weighted distances to five Boston employment centers, index of accessibility to radial highways, 1000(Bk - 0.63)2 where Bk is the proportion of blacks by town were significant predictors for crime rate per capita by town.
There are limitations of this study. One of them was associated with artificial neural network method. This method employed deep machine learning method to explore the nonlinear association between crime rate and house properties; however the nonlinear association make it very hard to interpreter the results, specially the association between the rate and individual predictors. Other predictors of crime rate were not available in this database.
In conclusion, we used both artificial neural network and linear regression model to predict the crime rate per capita by town. We found that artificial neural network performed better than linear regression which is the traditional method when build a predictive model. We believe that deep machine learning could be used in crime rate prediction. This might be helpful to improve this public safety issue in the future via better resource allocation.
References:
1. Todd M. Henderson et al. Predicting Crime. University of Chicago Law School Chicago Unbound. 2008.
2. Olligschlaeger A. M. Artificial Neural Networks and Crime Mapping, in D. Weisburd and T. McEwen, eds., Crime Mapping, Crime Prevention, Crime Prevention Studies 8 (1998).
3. Stephen Schneider et al. Predicting Crime: A Review of the Research. Summary Report. Research and Statistics Division 2002.