Научная статья на тему 'THE ROLE OF ARTIFICIAL INTELLIGENCE IN IMPROVING CRIMINAL JUSTICE SYSTEM: INDIAN PERSPECTIVE'

THE ROLE OF ARTIFICIAL INTELLIGENCE IN IMPROVING CRIMINAL JUSTICE SYSTEM: INDIAN PERSPECTIVE Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
391
131
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
ARTIFICIAL INTELLIGENCE / LAW ENFORCEMENT / CRIMINAL JUSTICE / PREDICTION ALGORITHM / ACCURACY / MACHINE LEARNING / MOTIVES / CYBER ATTACKS / INFORMATION TECHNOLOGY LAWS

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Gawali Puneet, Sony Reeta

The increasing cyber-attacks have created havoc in the criminal justice system. Understanding the purpose of crime and countering it is the crucial task for the law enforcement agencies. This research aims to present how Artificial Intelligence and Machine Learning along with Predictive Analysis using soft evidence can be used in sorting out the existing criminal record while making the use of metadata, and therefore predicting crime. Furthermore, it would surely help out the police and intelligence bodies to smartly investigate the cases by referring to the database and thus help the society in curbing the crime by quicker and more effective investigation processes. It would also assist the analyst in tracking the activities and associations of various criminal elements through their recent activities, by extracting the particular details from the documents or records. Prediction of the crime can be understood through this research. The present study reflects the accuracy level of threat from 28 states of India. By researching on this topic, it becomes evident that if proper data is fed to this model, the chances of prediction are higher and more accurate. The study also tried to find out the psychosocial perspectives of the crime and what would be the reason of individual indulges in such crime.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «THE ROLE OF ARTIFICIAL INTELLIGENCE IN IMPROVING CRIMINAL JUSTICE SYSTEM: INDIAN PERSPECTIVE»

The Role of Artificial Intelligence in Improving Criminal Justice System: Indian Perspective

Puneet Gawali

M.S. in Digital Forensics and Information Security, Institute of Forensic Science, Gujarat Forensic Sciences University. Address: Gandhinagar, Gujarat, India. E-mail: puneet. [email protected]

EH Reeta Sony

Assistant Professor, Centre for Studies in Science Policy, School of Social Sciences, Jawaharlal Nehru University, PhD. Address: New Mehrauli Road, JNU Ring Rd., New Delhi 110067, India. E-mail: [email protected]

The increasing cyber-attacks have created havoc in the criminal justice system. Understanding the purpose of crime and countering it is the crucial task for the law enforcement agencies. This research aims to present how Artificial Intelligence and Machine Learning along with Predictive Analysis using soft evidence can be used in sorting out the existing criminal record while making the use of metadata, and therefore predicting crime. Furthermore, it would surely help out the police and intelligence bodies to smartly investigate the cases by referring to the database and thus help the society in curbing the crime by quicker and more effective investigation processes. It would also assist the analyst in tracking the activities and associations of various criminal elements through their recent activities, by extracting the particular details from the documents or records. Prediction of the crime can be understood through this research. The present study reflects the accuracy level of threat from 28 states of India. By researching on this topic, it becomes evident that if proper data is fed to this model, the chances of prediction are higher and more accurate. The study also tried to find out the psychosocial perspectives of the crime and what would be the reason of individual indulges in such crime.

Keywords

Artificial Intelligence, law enforcement, criminal justice, prediction algorithm, accuracy, machine learning, motives, cyber attacks, information technology laws.

For citation: Puneet G., Sony R. (2020) The Role of Artificial Intelligence in Improving Criminal Justice: Indian Perspective // Legal Issues in the Digital Era, no 3, pp. 78-96.

DOI: 10.17323/2713-2749.2020.3.78.96

This article is published under the Creative Commons Attribution 4.0 License

Introduction

In 1956, Artificial Intelligence (AI) was first introduced by its father, John McCarthy, in Dartmouth [McCarthy J., 2006: 12-14]. Digital transformation brings risks, as technology is the first layer [Dmitrik N., 2020: 54-78]. In recent years, technologies based on AI and Machine learning (ML) have progressively increased in their capability and accessibility, showing no sign of abating [Caldwell M., 2020: 1-13]. By understanding the AI law for the future, its advantages and disadvantages that can make AI advisable to humanity [Cui Y., 2020: 187-191]. AI research and its regulation aspire to balance innovation's social security against potential harms and obstructions [King T., Aggarwal N., Taddeo M., Floridi L., 2020: 89-120]. Development, adoption, and promotion of AI are the priorities of the Indian Government to make lives easier for society [Marda V., 2018: 1-19]. The preciseness and verisimilitude of the details about where the crimes occur, furthermore information on the depiction of crimes provided an approach to understanding such crimes in other countries [Furtado V., 2010: 4-17]. McGuire and Holt's further throws light on the impressive and much needed Routledge Handbook of Technology, Crime and Justice [McGuire M., Holt T., eds., 2017: 1-722] that has evidence of criminology's burgeoning of technological interest [Hayward K., Maas M., 2020: 1-25]. The most important lookout to implement this research would be to update judges to be specialist in the field of computer; such laws should be implemented wherein all the judges should be well trained to use this technology1. Using Artificial Intelligence which is the main emerging technology invented by John McCarthy and is beneficial as it perceives all the data as it is. In contrast, a human mind has to choose or make a selection from the different pieces of data before reasoning, leading to possible errors2. Information technologies and its applications has become more diverse and effective, such as COPLINK. As this COPLINK, is a licensed software that bridges the gap by conducting research as well as solving real world crimes by helping police officers as they serve the community in a sophisticated and understandable way [Chen H., 2003: 271-285]. This COPLINK project unites University of Arizona's Artificial Intelligence Lab with the Tucson Police Department's law enforcement, where crimes analyst, detectives, sergeants use this technology [Hauck R., 2002: 30-37]. In this paper, we have discussed on how the Artificial Intelligence could be used for the resolutions of criminal justice system, since it becomes difficult for the court of law to maintain the database of all the criminal activities, we have tried to sort this issue by feeding some criminal data to the model created by us and

1 Available at: https://builtin.com/artificial-intelligence (accessed: 25.11.2019)

2 Available at: https://towardsdatascience.com/advantages-and-disadvantages-of-artificial-intelligence-182a5ef6588c (accessed: 25.11.2019)

therefore improving the way of investigation. Data plays a significant role in the criminal justice system, especially in predictive analysis3 since the data itself reveals the information of the crime. It is crucial to think about the diverse and vast ethical dilemmas occurring in the criminal justice system, which involves making moral judgments and deciding about wrong and right. Data mining can be used in understanding and designing crime detection models [Nath S., 2006: 41-44].

Such ethics have been maintained since the model cannot be biased and gives accurate results. We understand that using predictive analysis is challenging in policing. Still, it should not mean that law enforcement agencies should not use analytics or intelligence for the improvement of investigation [Isaac W. 2017: 543]. With this research risk assessment and the investigation of the criminal justice system will become more sophisticated. The possible question raised would be who should be accountable for semi-automated decisions? [Zavrsnik A., 2020: 567-583] since the accuracy is directly proportional to the data fed; therefore, the entire model depends on the specificity of the data. This tool can be useful for the lawyers as well those who are expert in technology and those who are not so technically advanced; they can make usage of this tool for predicting using different datasets [Alarie B., Niblett A., & Yoon A., 2018: 106-124].

1. Preparing the Model

The most crucial concept for approaching this topic would be the understanding of recidivism; through this model, we can keep a close watch on the behavior of various states and the crime committed. As shown in the Fig. (1) we can see how through certain steps our data is being processed in order to get the desired results. Different programming languages and environments enable ML research and development of its application. Python language has a tremendous growth within the scientific computing communities in the last decade, so in this case most recent ML and deep learning libraries are associated with Python based [Raschka S. et al, 2020: 193]. Python is used to prepare the model of predictive analysis and using the EDA (Exploratory Data Analysis) when a particular data becomes large or we need to understand some complex relationships in the variables. Through this paper we can perform the molding of such data for better investigation purpose.

First, the data is loaded in python and then we perform data cleaning and exploring the information in the variables. Pandas which provide data frames are imported using python, Matplotlib provides plotting support, and Numpy provides scientific computing within dimensional object support as seen in Fig. (2).

3 How data plays a significant role. Available at: https://www.aclu.org/issues/privacy-technology/ surveillance-technologies/ai-and-criminal-justice-devil-data (accessed: 09.04.2018)

60% Training

Data Preparation

Algorithm Selection

Training the Model

30% Testing

Model Testing

Fig. 1. Model Process Flowchart

In [2]: import pandas as pd import numpy as np import matplotlib.pyplot as plt import seborn as seabornInstance from sklearn.linear_model import LogisticRegression

from sklearn.model selection import train test_split

from sklearn import metrics import os

% matplotlib inline

Fig. 2. Importing Libraries

# pandas is a dataframe library

# numpy provides N-dim object support

# matplotlib.pyplot plots data

Secondly, standardization and visualization of data is very important to ensure that data fits the assumptions of the models. The Universal Rule of Law states that human rights, democracy and development depend on the level of progress the organizations and governments can achieve on the criminal justice front. The primary and crucial objectives of the criminal justice are controlling and preventing crime, maintaining law and order, protecting fundamental rights of victims along with the people in conflict with law, punishment and rehabilitation of those adjudged guilty of committing of crimes, and protection of life and property against crime and criminality in general. It is considered to be the primary obligation of the state under the constitution of India [Dhillon K., 2011: 27].

This paper would thus give an overview how every police station can update their data and predict the criminal behavior of the crime or any data available. Im-

porting various libraries and functions is the positive point of using python in this research paper since the data could be easily adjusted, it can be seen in Fig. 3 and 4.

Accurately predicting rare events is difficult, so the probability of having them in data is low, and the probability of training the algorithm is also low. Therefore, we only need a few percentages of the event to be able to train, to ensure that we have a reasonable chance to define how correctly a person or state is likely to develop the behavior or motive of committing a crime. Importing pandas will let us easily search the columns by name and see how many times this is true. Also, in the last column seen in the Fig.3 threat columns are mentioned which is categorically divided into binary 1s and 0s where 1s define that the attacks are increasing drastically whereas 0s define that the motives are mild. When a crime is predicted there will be questions arise regarding how an algorithm or code can be trustworthy4. This research would, therefore, throw light on this area where the data itself would be deciding everything, the more real the data the more effective the accuracy would be. Data mining and predictive analysis play an essential role in our life5. Now if we look into the data available very carefully, we can find whichever states having high unemployment rate (according to report by the Centre for Monitoring Indian Economy). It is noteworthy, that such states have high cybercrime rates which further denotes that in various states computer is used as a source to dupe money through various online frauds. The reason behind this is maintaining the anonymity and causing the harm because of vengeance or other motives. Cybercriminals mostly exploit the high-speed internet available at a lower cost to commit various criminal activities without being caught unless the states possess properly well-maintained cybersecurity labs to curb such crimes. The CMIE report further reveals that people belonging to age group 40 to 59 years have been successfully able to retain their jobs whereas people aged below 40 years were expelled out of their respective jobs which lead to social tension, desire of revenge, anger and other motives to launch such cyber-attacks6.

The data shown in Fig. (3) presents the topmost cyber-crimes happened in various states of India until 2019. So far, which includes such crimes as bullying on social media and not full-fledged crimes wherein a lot of technical skills are required, this shows that certain age groups of people have launched such attacks to malign the image of the victim7.

4 How code can be trustworthy. Available at: https://www.smithsonianmag.com/innovation/ artificial-intelligence-is-now-used-predict-crime-is-it-biased-180968337 (accessed: 05.03.2018)

5 What is Data Mining? Definition of Data Mining, Data Mining Meaning — The Economic Times (indiatimes.com). Available at: https://economictimes.indiatimes.com/definition/data-mining (accessed: 07.12.2020)

6 The recent unemployment data. Available at: https://www.cmie.com/kommon/bin/sr.php?kal l=warticle&dt=2020-01-21%2009:51:47&msec=203 (accessed: 21.01.2020)

7 National Crime Records Bureau Empowering Indian Police with Information Technology Available at: https://ncrb.gov.in/en (accessed: 22.10.2020)

Risk o o - o o o - - o o o - o - - o o o o -

Others m <N m <N m 00 o M3 <N ^ <N <N ^ o 00 00 <N 9 9 m - o o ON

Abetment to Suicide - o o o o O o o O o o o O - o o o o o o

Steal Information O o o o o o m o O o o - o o m o o o o o

Psycho or Pervert <N o o o o o o o o o o - o o o o o - o o

Spreading o o o o o o o o - o Ln o o <N o o o o

Developing own business <N o o o - o - <N o o <N m Ln m o o o o o

Sale purchase illegal drugs O o o o o o o o o o o o o o <N o <N o o o

Disrupt Public Service - o o o o o o o o m o m o - <N o O - o o

Inciting Hate against Country - o m o - o t^ <N o <N o m o Os <N m m - o o o

Terrorist Activities - o o o o o <N o - - m CO o - o o o o

Political Motives <N o o o o O m - <N <N 00 - o <N o m o o o

Sexual Exploitation <N o m 00 <N ^ m <N m m o Ln 00 o m 4 <N o m o o t^ m

Prank O o o o o o <N - m o - m <N 00 o o o o o

Causing_ Disrepute t^ o m <N o m <N <N ^ m - m t^ ^ 00 ^ 9 O ^ o m o - o

Extortion m o m m <N <N o <N <N - t^ ff G\ 00 o m m o o o 4 <N <N

Fraud m m t^ <N 9 00 m Lin m m <N - O m 00 o <N m 00 5441 m G\ o m <N 1998 ^ m m o o 90S

Anger <N o 00 - o <N m o - O o 00 O 9 <N o o ^ o o

Personal_ Revenge m o 9 m <N m o o <N <N o m o o o -

State_UT Andhara Pradesh Arunachal Pradesh Assam Bihar Chhattisga Goa Gujarat Haryana Himachal Jammu & Jharkhand Karnataka Kerala Madhya P Maharash Manipur Meghalaya Mizoram Nagaland Odisha

Risk o - o o - o - o - o o o o o o

Others 00 m m 00 m - o <N o m o 1931 m 00 m - - o o o m -

Abetment to Suicide o o o O o o o o o o o o o o o

Steal Information o o o ^ o o o m o o o o o o o

Psycho or Pervert o o o o o o o o o o o o o o o

Spreading o <N o - o o ^ NO o o o o o o <N o

Developing own business <N o CO o o m t^ o o o o o o m m o

Sale purchase illegal drugs <N O o o o o o o o o o o o o o

Disrupt Public Service o o o - o o ON o o o o o o o o

Inciting Hate against Country m ON o Os m m o ON m o <N o o o o o o

Terrorist Activities o o o <N o o o o O o o o o o o

Political Motives <N m o <N Ln ^ m o - o o o o o o

Sexual Exploitation m 00 o NO o \D m m 3 3 m os m o o o o m m <N

Prank ^ o \D <N o ON <N <N m o t^ o o - o

Causing_ Disrepute NO NO o t>- m o 3 3 <N <N m o o o o

Extortion m m o t>- m o 9 ON NO Ln <N o m o o - o

Fraud 00 9 ON o Ln Ln <N m t^ 00 2351 NO 00 m ON o o NO m -

Anger - o m m o m 0\ o o o o ^ o

Personal_ Revenge ON o G\ m ON m m 00 <N o o o o 3 o

State_UT Punjab Rajasthan Sikkim Tamil Nad Telangana Tripura Uttar Prac Uttarakha West Ben A & N Island Chandigar D&N Have Daman & Delhi UT Lakshadow

t-s

Q

P4 tJ^

In Fig. 4 we can see the features of data, a feature is something that's used to determine a result, and a column is a physical structure that stores the value of a feature or a result. In Fig. 12, using shape function the data is displayed in the format of rows and columns; here we have 36 rows and 13 columns; also, we check whether there are any null values present in the data sets shown in Fig. 12. Matplot library is used to create a function that cross plots feature so that we can see when they are correlated. Data is then inspected in order to eliminate any additional columns or rows to with no values that we no longer required. The duplicates including the same values are removed the same way. This is done to arrange our data since visual inspection may be error-prone and cannot deal with the critical issue of correlated columns. Thus, pandas help in understanding such null values and therefore identifying it in our data as we can see in Fig. 6, Is Null method will check each value on the data frames for null values. Similarly, Matplot library is used to create a function plots features so that we can see when the data is correlated: the color in yellow denotes the very positive correlation as seen in Fig. 11 and other color denotes that the data is not well correlated. In Fig. 11 we can see that column names on the horizontal and vertical axes is a matrix showing which column contains the data that are correlated with values.

In [3]: os.getcwd()

os.chdir ('C:/Users/Puneet/CRIME RECORD') os.getcwd()

Out[3]: 'C:\\Users\\Puneet\\CRIME RECORD'

In [20]: Cyber data = pd.read csv('Cyber new.csv') # read dataset Cyber data.head()

0ut[20]:

State_UT Personal_ Revenge Anger Fraud Extortion Causing_ Disrepute Prank Sexual Exploitation Political Motives Terrorist Activities Inciting Hate against Country Disrupt Public Service Sale purchase illegal drugs Developing own business

0 Andhara Pradesh 34 26 733 45 7 0 92 12 1 1 1 0

1 Arunachal Pradesh 0 0 2 0 0 0 0 0 0 0 0 0

2 Assam 239 46 389 153 234 0 113 9 4 3 0 0

3 Bihar 5 8 351 2 0 0 8 0 0 0 0 0

4 Chhattisga 0 1 23 2 25 0 21 4 0 1 0 0

Fig. 4. Selecting the data

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

As we can see in Fig. (4) and (5), data is fetched from the file path and utilized for the further data cleaning and correlating.

In [3]: os.getcwd()

os.chdir ('C:/Users/Puneet/CRIME RECORD') os.getcwd()

Out[3]: 'C:\\Users\\Puneet\\CRIME RECORD'

In [20]: Cyber data = pd.read csv('Cyber new.csv') # read dataset Cyber data.head()

0ut[20]:

Causing_ Disrepute Prank Sexual Exploitation Political Motives Terrorist Activities Inciting Hate against Country Disrupt Public Service Sale purchase illegal drugs Developing own business Spreading Psycho or Pervert Steal Information Abetment to Suicide Others Risk

7 0 92 12 1 1 1 0 2 14 2 0 1 236 0

0 0 0 0 0 0 0 0 0 0 0 0 0 5 0

234 0 113 9 4 3 0 0 0 0 0 0 0 832 1

0 0 8 0 0 0 0 0 0 0 0 0 0 0 0

25 0 21 4 0 1 0 0 1 0 0 0 0 61 0

Fig. 5. Showing the data

2. Molding the Data

After cleaning the data of any extra columns or null values, we proceed to molding the data by inspecting if there are any issues. Algorithms are largely mathematical models which work best with numeric quantities and once the data molding is done, we can use this data for further training the algorithm as seen in Fig. 6 count, mean, std, etc. is calculated so that the data is molded accurately. Therefore, in machine learning, a lot of data manipulation is done for trial and error and predicting the best of the accuracy. When the data is manipulated it's very easy to change the meaning of the data what also helps in understanding if data has gone wrong anywhere. The entire model is created in Jupyter Notebook, therefore keeping track of all the changes and updates have been done automatically [Perkel J. et al, 2018: 145-147]. We also have the interactivity of the python interpreter using which we can make our data simpler for the prediction, as seen in Fig.6 and 7.

In [6] : Cyber 'data . describe ( Out [6] :

Personal_ Revenge Anger Fraud Extortion Causing_ Disrepute Prank Sexual Exploitation Political Motives Terrorist Activities Inciting Hate against Country Disrupt Public Service Sale purchase illegal drugs

count 36.000000 36.000000 36.000000 36.000000 36.000000 36.000000 36.000000 36.000000 36.000000 36.000000 36.000000 36.000000

mean 22.055556 12.805556 418.083333 29.166667 33.666667 8.222222 56.388889 6.055556 1.222222 6.055556 0.583333 0.166667

std 44.940560 25.484807 1007.891615 54.345193 72.200119 31.825516 129.880886 12.094732 3.330475 13.259917 1.645340 0.560612

min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

25% 0.000000 0.000000 6.750000 0.000000 0.000000 0.000000 2.750000 0.000000 0.000000 0.000000 0.000000 0.000000

50% 5.000000 3.500000 41.000000 7.500000 3.500000 0.000000 15.500000 0.500000 0.000000 0.000000 0.000000 0.000000

75% 21.000000 10.250000 392.000000 26.500000 20.500000 3.000000 52.500000 4.000000 1.000000 3.000000 0.000000 0.000000

max 239.000000 129.000000 5441.000000 224.000000 343.000000 191.000000 724.000000 52.000000 16.000000 59.000000 9.000000 2.000000

T3 c 3 CD CD

a

CD CD

C/5 o

3 CD

a

o 5" o

In [7] : cyber 'data . isnull ( ) . any ( ]

out [7] : StateJJT

Eersonal Revenge

Anger

Fraud

Entortion

Causing Disrepuute

Frank

Se::ual Eupdoitation

E'c'iiticai MC'tives

Terrorist Activities

Inciting Hate against Country

Disrupt E'ublic Service

Sale purchase illegal drugs

Developing own business

Spreading

Fsycho C'r Eervert

Steal InfC'rrnatiC'n

Ab'etrnent tC' Suicide

others

Risk

False False False False False False False False False False False False False False False False False False False False

00 >1

Fig. 6. Null values are checked

3 CD

CQ1 CD 3 O CD

O S 3' <Q

o

g

S i

CO CT)

00 00

In [ Out

: Cyber_data.describe(

Prank Sexual Exploitation Political Motives Terrorist Activities Inciting Hate against Country Disrupt Public Service Sale purchase illegal drugs Developing own business Spreading Psycho or Pervert Steal Information Abetment to Suicide Others Risk

36.000000 36.000000 36.000000 36.000000 36.000000 36.000000 36.000000 36.000000 36.000000 36.000000 36.000000 36.000000 36.000000 36.000000

8.222222 56.388889 6.055556 1.222222 6.055556 0.583333 0.166667 5.500000 18.638889 0.111111 0.444444 0.055556 137.666667 0.305556

31.825516 129.880886 12.094732 3.330475 13.259917 1.645340 0.560612 14.484474 102.147275 0.398410 1.229273 0.232311 349.023454 0.467177

0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

0.000000 2.750000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000

0.000000 15.500000 0.500000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 14.000000 0.000000

3.000000 52.500000 4.000000 1.000000 3.000000 0.000000 0.000000 2.250000 1.000000 0.000000 0.000000 0.000000 101.250000 1.000000

191.000000 724.000000 52.000000 16.000000 59.000000 9.000000 2.000000 75.000000 614.000000 2.000000 5.000000 1.000000 1931.000000 1.000000

In [7] : cyber_data.isnull() .any()

Out [7] : State_UT False

Personal_Revenge False

Anger False

Fraud False

Extortion False

Causing_Disrepute False

Prank False

Sexual Exploitation False

Political Motives False

Terrorist Activities False

Inciting Hate against Country False

Disrupt Public Service False

Sale purchase illegal drugs False

Developing own business False

Spreading False

Psycho or Pervert False

Steal Information False

Abetment to Suicide False

Others False

Risk False

Fig. 7. Null Values shown in risk column

3. Testing Model's Accuracy

In this section we will discuss the role of the Machine Learning algorithm. An algorithm can be defined as an engine that drives the entire process. For our prediction, we will use data containing examples of the results and try to predict the future using the scikit learn and the algorithm's logic the data is analyzed. This analysis evaluates the data concerning a mathematical model and logic associated the algorithm, and the algorithm then uses the results of this analysis to adjust internal parameters to produce a model that has been trained to best fit the features and give the best results. The best result is defined by evaluating a function specific to a particular algorithm. Therefore, the fit parameters are stored and hence the model is now trained. Further, we use this model to predict on the real data. We use the Sci-kit learn package in python to predict on the real data. The parameters of the trained model along with the python code is used to predict whether the state is in threat of cyber-attack or no. Selecting an appropriate algorithm from scikit learning was the toughest part which we faced while researching on this paper.

Prediction means supervised learning so eliminating all other algorithms was my main goal, furthermore, prediction can be divided into two more categories regression and classification, where regression means a continuous set of values. Predicting binary outcome whether the threat is there or not; we further eliminated all the algorithms that do not support classification in general and especially binary classification. Naïve Bayes, Logistic Regression and Decision Tree are algorithms which support classic machine learning algorithms and also provide excellent help in understanding more complex algorithms.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

In [8]: plt.figure(figsize=(15,10)) plt.tight layout()

seabornlnstance.distplot(cyber data['Risk']) Out [8]: <matplotlib.axes._subplots.AxesSubplot at 0x1c303d03808>

Risk

Fig. 8. Graph Denoting the Risk

(0 o

In [131: df.corri

Out [13]:

Online Banking Frauds Cyber Blackmailing/ Threatening Sec 506, 503, 384 Fake News on Social Media Sec 505 Cyber Terrorism sec 66F Tampering Computer Source Identity Theft sec 66C Computer related offence sec 66 Ransomware Offences other than Ransomware Cyber Stalking/Bullying of Women/ Children Sec 354D IPC

Online Banking Frauds 1.000000 0.297261 0.210440 -0.081656 0.297991 -0.029883 0.278153 0.276576 0.322202 0.799312

Cyber Blackmailing/ Threatening Sec 506, 503, 384 0.297261 1.000000 0.472718 0.368497 0.154649 -0.049758 0.156595 0.128745 0.154715 0.300828

Fake News on Social Media Sec 505 0.210440 0.472718 1.000000 0.589458 0.252916 -0.041483 0.231064 0.235354 0.270136 0.205406

Cyber Terrorism sec 66F -0.081656 0.368497 0.589458 1.000000 0.028105 -0.042616 0.061458 0.034039 0.025907 -0.047158

Tampering Computer Source 0.297991 0.154649 0.252916 0.028105 1.000000 0.065687 0.991130 0.992945 0.937691 0.028343

Identity Theft sec 66C -0.029883 -0.049758 -0.041483 -0.042616 0.065687 1.000000 0.014784 0.003531 0.074809 0.000353

Computer related offence sec 66 0.278153 0.156595 0.231064 0.061458 0.991130 0.014784 1.000000 0.998242 0.955151 0.010488

Ransomware 0.276576 0.128745 0.235354 0.034039 0.992945 0.003531 0.998242 1.000000 0.949917 0.006252

Offences other than Ransomware 0.322202 0.154715 0.270136 0.025907 0.937691 0.074809 0.955151 0.949917 1.000000 0.060779

Cyber Stalking/Bullying of Women/Children Sec 354D IPC 0.799312 0.300828 0.205406 -0.047158 0.028343 0.000353 0.010488 0.006252 0.060779 1.000000

Fig. 9. Correlation Performed

Logistic regression algorithm has a dubious name since in statistics a regression often implies continuous values but logistic regression returns a binary result. The algorithm measures the relationship of each feature and compares them based on their impact on the result. The result and value are then mapped against a curve seen in Fig. (8), which is equivalent to threat or no threat.

def plot corr(df, size=11):

Function plots a graphical correlation matrix for each pair of columns in the dataframe

Input:

df: pandas DataFrame

size: vertical and horizontal

Displays:

matrix of correlation between => less to more correlated

size of the plot

columns. Blue-cyan-yellow-red-darkred

0---------------------> 1

Expect a yellow line running from top left to bottom right

corr = df.corr()

fig,ax = plt.subplots(figsize(size, size)) ax.matshow(corr)

plt.xticks(range(len(corr.columns)), corr.c olumns) plt.yticks(range(len(corr.columns)), corr.c olumns)

plt.setp(ax.get xticklabels(), rotation=90, horizontalalignment='right')

Fig. 10. Giving the values for correlation

In [10]

plot corr(cyber data)

Fig. 11. Correlation graph

4. Training the Model

Splitting the cyber data into two sets one for training the model and the other for testing the model, about 70% of the data we have put in the training set and 30% of data in the testing set, after this, we have trained the algorithm with the training data and held the test data aside for evaluation. This training process produces a training model based on the logic in the algorithm and the values of the features in the training data. Care has taken not to use all the data to train since data drives training of the model. The library which handles machine learning, training and evaluation tasks in Python is Scikit learning, it provides a set of simple and efficient tools that can manage many of the tests in machine learning.

Scikit supports machine learning and it is built on Python libraries such as NumPy, SciPy and Matplotlib and supports these and panda's data frames. It is generally a toolset that makes training and evaluation tasks simple; these tasks involve splitting the data into training and test sets, preprocessing data before training, selecting the most important data features, creating train model, tuning the model for better performance.

In [11]: x = cyber data.drop(['Risk', 'State UT'], axis=1) # Independent Variables

y = cyber data[['Risk']] # Dependent Variable In [12]: X.shape Out [12]: (36, 18)

In [24]: # Splitting the data into train and test

X train, X test, y train, y test = train test split(X, y, test size=0.4, random state=1)

In [14]: # Building Linear Regression reg = LogisticRegression() reg.fit(X train, train)

M: \Users\Puneet\anaconda3\lib\site-packages\sklearn\utils\ validation.py:7 60: DataConversionWarning: A column-vector y was passe d when a 1d array was expected. Please change the shape of y to (n samples, ), for example using ravel(). y = column or 1d(y, warn=True)

M: \Users\ Puneet\anaconda3\lib\site-packages\sklearn\linear model\ logistic.py:94 0: ConvergenceWarning: 1bfgs failed to converge (status=1):

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Fig. 12. Applying regression algorithm

5. Checking the Accuracy

Explanation for code:

Since we know that through our research aimed to predict whether a particular State/UT is at a higher risk of cybercrime when using such variables as Personal revenge, Anger, Fraud, etc. In order to predict this relationship, we have used a statistical technique Logistic Regression. Before we move to modeling, we have to check if there is any correlation between the independent variables, in other words, we have to check if there is a relationship between the independent variables (example Personal revenge, Anger, Fraud etc.). In the correlation plot, we should ignore the diagonal block as the diagonal block in yellow represents the correlation with itself (i.e., Personal revenge and Personal revenge) in which we are not interested. Yellow color represents high correlation, light green color represents moderate correlation, dark green color represents low correlation, and complete dark color represents no correlation. So, from the plot we can say that there is a high correlation between Sexual exploitation and Anger, spreading piracy and prank etc. as if we see the block of these variables in the plot, they are yellow in color. There is a moderate correlation between Spreading piracy and Causing disrepute, Prank and Inciting hate against country etc. as if we see the block of these variables in the plot, they are yellow in color. Similarly, we can say that the variables with darker blocks have less of no correlation. We have divided the data into X and Y where X is the independent variable and y denotes the dependent variable. So are independent variables being Personal revenge, Anger, Fraud, etc. and our dependent variable is risk.

Further checked the shape of X (just a sense check), then divided the variable into train and test, (we will use the X_train and y_train to train the logistic regression model and then test the model using X_test and y_test). Now we have used the function to build a logistic regression model using the data X_train and y_train. We are using this logistic regression when our dependent variable has dichotomous type, i.e., True/False, Absent/Present etc. Now having built a model, we have predicted the expected values of y using X_test. After predicting the expected values for y we will now check the accuracy of the model. The accuracy of the model depends on the number of cases we have predicted correctly, i.e., the number of times we have predicted that the State/UT is at risk. The state was actually at risk and the number of times we have predicted that the State/UT is not at risk and the state was not at risk. As seen in Fig. 11, 12, and 13, we can see that how the model behaves in predicting the accuracy of the threat in the states.

Out [14]

In

[15]

Out [15] In [18]: Out [18] In [19]: Out [19]

LogisticRegression(C=1.0, class weight=None, dual=False, fit intercept=True,

intercept scaling=1, 11 ratio=None, max iter=100, multi class='auto', n jobs=None, penalty='12', random state=None, solver='1bfgs', tol=0.0001, verbose=0, warm start=False)

# Predicting the cases y pred = reg.predict(X test) y pred

array([0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, Metrics.accuracy score(y test, y pred) 0.6666666666666666 cyber data.head(5)

dtype=int64)

Causing_ Disrepute Prank Sexual Exploitation Political Motives Terrorist Activities Inciting Hate against Country Disrupt Public Service Sale purchase illegal drugs Developing own business Spreading Psycho or Pervert Steal Information Abetment to Suicide Others Risk

7 0 92 12 1 1 1 0 2 14 2 0 1 236 0

0 0 0 0 0 0 0 0 0 0 0 0 0 5 0

234 0 113 9 4 3 0 0 0 0 0 0 0 832 1

0 0 8 0 0 0 0 0 0 0 0 0 0 0 0

25 0 21 4 0 1 0 0 1 0 0 0 0 61 0

Fig. 13. Model predicting the accuracy

Conclusion

Through the study above, we can conclude that by trial and error of various algorithms, we could draw some crucial points with the help of the Logistic Regression Algorithm. This research would surely help the law enforcement agencies understand the root cause of the crime as if there was any political movement, natural crisis, or else massive dropouts in the particular state which led to a person committing the crime. As we cannot rely on this model completely in sentencing the accused, his/her parenting, upbringing, society, and teachings should also be gone through to understand the reason behind committing the crime, as we all know the law enforcement agencies or government can only bestow law upon us. Still, the root cause of this crime should be found out and eradicated. The bigger question is, how will technology shape the judicial function, and to what extent [Sourdin T., 2018. Judge v. Robot: Artificial Intelligence and Judicial Decision-Making. UNSWLJ, 41, pp: 1114], but it will surely benefit the judiciary system in

some or other way. The various sectors can benefit from this new technology provided that it is not used for somebody's harm for it to behave in unpredicted and potentially harmful ways [Cath C., 2018: 1-8]. Thus, the proper judicial monitoring of data fed can enjoy this model's beauty.

References

Alarie B., Niblett A. & Yoon A. (2018) How artificial intelligence will affect the practice of law. University of Toronto Law Journal, vol. 68, supplement 1, pp. 106-124.

Caldwell M. et al (2020) AI-enabled future crime. Crime Science, no 1, pp. 1-13.

Cath C. (2018) Governing artificial intelligence: ethical, legal and technical opportunities and challenges. Phil.Trans. Royal. Society, issue 2133, pp. 1-8.

Chen H. et al (2003) COPLINK Connect: information and knowledge management for law enforcement. Decision support systems, no 3, pp. 271-285.

Cui Y. (2020) Building AI-assisted rule of law for the future, seeking advantages and avoiding disadvantages to make AI better benefit mankind. In: Artificial Intelligence and Judicial Modernization. Singapore: Springer, pp. 187-191.

Dhillon K. (2011) The police and the criminal justice system in India. The Police, State, and Society: Perspectives from India and France. Pearson, pp. 27-59.

Dmitrik N. (2020) Digital State, Digital Citizen: Making Fair and Effective Rules for a Digital World. Legal Issues in the Digital Age, no 1, pp. 54-78.

Furtado V. et al (2010) Collective intelligence in law enforcement-The Wiki-Crimes system. Information Sciences, no 1, pp. 4-17.

Hauck R. et al (2002) Using Coplink to analyze criminal-justice data. Computer, no 3, pp. 30-37.

Hayward K., Maas M. (2020) Artificial intelligence and crime: A primer for criminologists. Crime, Media, Culture, pp. 1-25.

Isaac W. (2017) Hope, hype, and fear: the promise and potential pitfalls of artificial intelligence in criminal justice. Ohio St. J. Crim. L., vol. 15, p. 543.

King T., Aggarwal N., Taddeo M., Floridi L. (2020) Artificial intelligence crime: An interdisciplinary analysis of foreseeable threats and solutions. Science and engineering ethics, no 1, pp. 89-120.

Marda V. (2018) Artificial intelligence policy in India: a framework for engaging the limits of data-driven decision-making. Philosophical Transactions of the Royal Society: Mathematical, Physical and Engineering Sciences, vol. 376, pp. 1-19.

McCarthy J. et al (2006) A proposal for the Dartmouth summer research project on artificial intelligence. AI magazine, no 4, pp. 12-14.

McGuire M., Holt T. (eds.) (2017) The Routledge Handbook of Technology, Crime and Justice. L.: Taylor & Francis, pp. 1-722.

Nath S. (2006) Crime pattern detection using data mining. In: 2006 IEEE/ WIC/ACM International Conference on Web Intelligence and Intelligent Agent, pp. 41-44.

Perkel J. (2018) Why Jupyter is data scientists' computational notebook of choice. Nature, vol. 563, pp. 145-147.

Raschka S. et al (2020) Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence. Information, no 4, p. 193.

Sourdin T. (2018) Judge v. Robot: Artificial Intelligence and Judicial Decision-Making. UNSW Law Journal, vol. 41, pp. 11-14.

Zavrsnik A. (2020) Criminal justice, artificial intelligence systems, and human rights. ERA Forum Springer, no 4, pp. 567-583.

i Надоели баннеры? Вы всегда можете отключить рекламу.