Machine Learning-based Soft Computing Regression Analysis Approach for Crime Data Prediction

Abstract

The crime rate in India is considerably increasing day by day. Consequently, the data associated with crime is alsoincreasing, opening doors for data-driven approaches to these data to extract insightful knowledge, which can helppolice and other law enforcement organizations of the country in crime control and prevention. Crime predictionusing machine learning algorithms on crime data can predict region-wise crime counts. In this paper, a machinelearning-based soft computing regression analysis approach for Indian Crime Data Analysis (ICDA) is proposed.Different regression algorithms, namely, Simple Linear Regression (SLR), Multiple Linear Regression (MLR), Decision Tree Regression (DTR), Support Vector Regression (SVR), and Random Forest Regression (RFR) are uses tobuild regression models. These regression models can predict a total number of Indian Penal Code (IPC) crime countsand crime counts of different types of crime (murder, rape, kidnapping and abduction, riots, to name a few) regionwise and state-wise and all over the country for a given year. Adjusted R squared value and Mean Absolute Percentage Error (MAPE) is used to evaluate and compare proposed regression models. In the proposed approachfor ICDA, district-wise spatial-temporal crime data of years 2001e2012 is used, collected from the official websiteof NCRB. For the chosen data, it is concluded that the region-wise total IPC crime prediction RFR model fits thebest with an adjusted R squared value of 0.9631551 and an error of 0.2027437. Whereas for region-wise theftscrime count prediction, the RFR model fits the best with an adjusted R squared value of 0.966604 and an errorof 0.16571.