Научная статья на тему 'USING THE K-NEAREST NEIGHBOR METHOD TO CATEGORIZE EEG DATA IN A CONCEALED INFORMATION TEST'

USING THE K-NEAREST NEIGHBOR METHOD TO CATEGORIZE EEG DATA IN A CONCEALED INFORMATION TEST Текст научной статьи по специальности «Медицинские технологии»

CC BY
38
5
i Надоели баннеры? Вы всегда можете отключить рекламу.
Журнал
Science and innovation
Ключевые слова
signal / key / sensitivity / accuracy

Аннотация научной статьи по медицинским технологиям, автор научной работы — K. Abdurashidova, Kh. Kurbonboev

Electrical activity from the brain is captured using electrodes on the scalp. The primary goal is to differentiate EEG data between innocence and guilt. Data from 10 individuals has been collected. The raw EEG signals undergo signal processing via a band-pass filter. The crucial step involves extracting key features from these processed EEG signals, focusing on statistical parameters like mobility, activity, and complexity in the time domain. The classification into guilty or innocent categories is achieved using a k-nearest neighbor classifier. To validate the accuracy of the deception detection system, a 5-fold cross-validation method is applied to each subject. Performance metrics such as accuracy, sensitivity, and specificity are used to assess the classifier's effectiveness. Among three Hjorth parameters tested, mobility demonstrated the highest classification accuracy, reaching up to 96.7%.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «USING THE K-NEAREST NEIGHBOR METHOD TO CATEGORIZE EEG DATA IN A CONCEALED INFORMATION TEST»

USING THE K-NEAREST NEIGHBOR METHOD TO CATEGORIZE EEG DATA IN A CONCEALED INFORMATION

TEST

1Abdurashidova K., 2Kurbonboev Kh.

Associated professor Tashkent university of information technologies 2Master's student Tashkent university of information technologies https://doi.org/10.5281/zenodo.10396067

Abstract. Electrical activity from the brain is captured using electrodes on the scalp. The primary goal is to differentiate EEG data between innocence and guilt. Data from 10 individuals has been collected. The raw EEG signals undergo signal processing via a band-pass filter. The crucial step involves extracting key features from these processed EEG signals, focusing on statistical parameters like mobility, activity, and complexity in the time domain. The classification into guilty or innocent categories is achieved using a k-nearest neighbor classifier. To validate the accuracy of the deception detection system, a 5-fold cross-validation method is applied to each subject. Performance metrics such as accuracy, sensitivity, and specificity are used to assess the classifier's effectiveness. Among three Hjorth parameters tested, mobility demonstrated the highest classification accuracy, reaching up to 96.7%.

Keywrods: signal, key, sensitivity, accuracy.

Introduction

The Brain Computer Interface (BCI) is an advanced technology that taps into the computational abilities of the brain. Until recently, creating BCI systems seemed like something out of science fiction. However, the advent of electroencephalography (EEG) changed this perspective, motivating researchers to decipher the EEG signals obtained from the brain [1]. BCI consists of four primary phases: Signal Acquisition, Signal Processing, Feature Extraction, and Classification. Signal Acquisition involves recording EEG signals from the brain's surface (or scalp) using electrodes. These signals are then amplified through an amplifier to boost their strength and subsequently digitized. Signal Processing focuses on reducing noise and eliminating artifacts. This stage is vital in converting the obtained signals into a suitable form for the succeeding processing steps. The feature extraction phase aims to extract features that could potentially interpret a subject's messages or commands. BCI utilizes various features derived from the time domain, frequency domain, or a combination of both to enhance accuracy and performance. Extracting meaningful and pertinent information from these features is a daunting task. Sometimes, to simplify complexity, the feature vector is reduced to lower dimensions, but this can risk losing relevant information. Classification of the signals is then performed using these feature vectors. Therefore, selecting discriminative features is crucial. Ultimately, these classified signals are translated into meaningful commands for applications on a computer or connected devices like wheelchairs or prosthetic devices.

BCI finds extensive applications in gaming, robotics, forensics, medicine, etc. Our primary focus is utilizing BCI for lie detection. Lie detection involves assessing a statement with the intent to reveal concealed intentions to deceive. Traditional lie detection methods include the widely known polygraph test. In this test, an interrogator asks a series of questions requiring yes or no answers. Physiological indices like blood pressure, heart rate, skin conductivity, and respiration

are monitored as the person responds. Sudden changes in these indices might indicate deceit. However, these results aren't strongly accepted as evidence in the legal system. Skilled individuals can control their physiological responses, potentially leading to misinterpretation and an innocent person being labeled as guilty due to fear.

EEG signals offer a solution to this issue, as they are involuntary and beyond a user's control. This aspect helps mitigate the problem of intentional manipulation. Figure 1 in the paper illustrates the fundamental BCI architecture, encompassing all four phases, and depicts the feedback provided to the user.

HjorthN "^"N.

Parameter y

Signal Acquisition Signal Preprocessing Feature Extraction

Feedback Output Classification

K-Nearest Neighbor

Fig. 1. Architecture of Brain Computer Interface

The P300 (P3) wave is a well-studied event-related potential observed in response to unusual and meaningful stimuli, often presented in what's known as the "oddball" paradigm. This wave exhibits a positive deflection occurring typically between 300 to 1000 milliseconds after the onset of the stimulus. It's notably prominent at the parietal lobe (Pz), less so at the frontal lobe (Fz), and takes intermediary values at the central lobe (Cz).

In the P300-based "Concealed Information Test (CIT)" or "Guilty Knowledge Test (GKT)," the focus is on the P300 amplitude, operating on the assumption that distinct responses occur when a person is presented with familiar items amidst a series of similar but unknown items. The premise is that when an individual is shown a known object, a P300 wave is generated. In experiments, an innocent person might be shown crime-related objects, yet a P300 wave won't be generated. Conversely, a guilty person might deny knowledge, but a P300 response still occurs. This concept forms the basis for lie detection experiments.

Numerous researchers have conducted tests and employed diverse statistical and machine learning approaches to classify EEG data into categories of guilt and innocence. For instance, Farwell et al. [2] conducted a "Guilty Knowledge Test" using Event-Related Brain Potentials (ERPs). In this test, three types of stimuli are used: probe stimuli, target stimuli, and irrelevant stimuli. A P300 response is triggered when an individual encounters a familiar object, such as the target stimuli. Someone can potentially be incriminated for a crime if they possess "guilty information," deny it, and yet exhibit a P300 response. This indicates that the person holds specific knowledge about the displayed object. The performance of a system was analyzed using Bootstrapped analysis, resulting in no false positives or false negatives; however, approximately

12.5% of cases produced intermediate results. Haider et al. [3] proposed a lie detection method employing Linear Discriminant Analysis (LDA) to segregate positive and negative samples. They utilized sixteen channels and various signal extraction methods, implementing their work using MATLAB and Xilinx tools. Additionally, they executed the entire system on an FPGA to assess its efficiency, achieving an 85% accuracy. Their method was claimed to be simpler and more convenient compared to previously proposed approaches.

Simbolon et al. [4] introduced a method based on Support Vector Machines (SVM) for distinguishing between guilty and innocent individuals using Event-Related Potentials (ERPs). Their experiment involved eleven males aged between 20 and 27. The data was split into training and testing sets, and different models were constructed. Using Hjorth parameters for feature extraction and k-nearest neighbor as a classifier, they attained an accuracy of up to 70.83%. Their paper includes sections on methodology (Section 2), results (Section 3), conclusion (Section 4), and references.

Methodology

Hjorth's Parameters

Hjorth, in 1970, introduced three statistical parameters [5]—activity, mobility, and complexity—utilizing the time domain to gather information about signals. These parameters have previously been employed for EEG feature extraction, particularly in the context of emotion recognition [6].

Activity: It is square of standard deviation of signal x(n), as given in equation 1, where T belongs to number of time samples

A(x(n))=

Mobility: Mobility gives root of the ratio between derivative of signals' variance and its variance [5] as shown in equation 2

M(x(n))= '5(*(n))

S(x(n))

where x'(n) represents differential of EEG signal x(n) and o represents variance of the data. Complexity: Complexity provides ratio of derivative of mobility to the mobility of signal as shown in equation3. Value of complexity lies in range of [0,1], here, 1 shows that signal is similar to sine wave [5].

C(x(n))=

v v " M(x(n))

Where x'(n) represents differential of EEG signal x(n). k-Nearest Neighbor Classifier

The k-Nearest Neighbor classifier [7] is a non-parametric method that assigns a given data point to a class based on the majority of its neighboring data points. Indeed, the KNN algorithm operates in two main steps. Initially, it identifies the number of nearest neighbors for a specific data point. Subsequently, it assigns a class to that data point based on the outcome of the first step. To determine these neighbors, the algorithm employs distance metrics such as the Euclidean distance, often expressed using Equation 4.

Distance(x,y) = - yd2

It chooses nearest k samples from the training set, then takes majority vote of their class where k should be an odd number to avoid ambiguity. Figure 2 illustrates architecture of KNN classifier. There are 2 classes, namely class 1 and class 2. The red asterisks indicate class 1 and

blue circles indicate class 2. K chosen is 5, and among the 5 nearest neighbors, 3 samples belong to class 1 and 2 samples belong to class 2. The KNN classifier works on the principle of giving new sample to the class with majority of votes in the defined K. So, the new test example is assigned to class 1.

Jg 3/5 /

2 2/5 X

New example

*

xl

Fig. 2. K-nearest neighbor

Data Acquisition

The data collection involved ten participants, nine males and one female, ranging in age from 20 to 25 years. None of them had any known psychological disorders and had normal or corrected vision. Before the EEG data recording commenced, all participants provided written informed consent to the experimenter.

The EEG data was recorded using Ag/AgCl electrodes positioned at various sites following the international 10-20 system, including Fz, FC1, FC2, C3, Cz, C4, CP5, CP1, CP2, CP6, P3, Pz, P4, O1, Oz, and O2. Additionally, Electro-Occulograph (EOG) recordings were obtained, capturing Vertical EOG (VEOG) and Horizontal EOG (HEOG) from the right eye. For VEOG, electrodes were placed above and below the eye, while for HEOG, an electrode was positioned on the outer canthus. As references, an electrode was placed on the mastoid and another on the forehead, serving as the ground.

The equipment used for signal acquisition included an EasyCap (a 32-Channel EEG Standard Cap Set from Munich, Germany), a V-amp amplifier, a set of 16 electrodes, and BrainVision Recorder [9]. The protocol for electrode placement was similar to the one detailed in [10].

In the Concealed Information Test (CIT), participants are engaged in a simulated scenario resembling a mock crime across two sessions. The participants are categorized into two groups: the "guilty" and "innocent" groups. Instead of employing direct questioning [11], a specific set of images is displayed on a screen, acting as stimuli. These stimuli are designed to elicit varied Event-Related Potential (ERP) responses within the subjects' brains. Similar to prior studies [11][12], three types of stimuli are presented to the subjects: target, irrelevant, and probe. These stimuli images are categorized as follows:

INION

Fig. 3. Electrodes location during experiment.

Target images: Images of well-known personalities known to all subjects.

Irrelevant images: Set of random unknown images shown to subjects.

Probe Images: Images related to crime. Probe will be image of person whom guilty subject knows well and is asked to imagine that he/she has committed crime with (or image of victim).

Before the experiment commences, participants receive a concise overview of the experiment's purpose and procedures.

Once they comprehend the scenario, the experiment initiates by displaying images or stimuli on a 15.4-inch screen in front of the participants. Their task involves recognizing these images and responding with either a "yes" or a "no" after each image. Each image is displayed for 1.1 seconds, followed by a 2-second blank interval, with a total of ten images presented to each subject. Among these images, seven are irrelevant, two are target images, and one is a probe image, presented randomly.

The experiment unfolds across two sessions, with each session comprising 30 trials per subject, totaling 600 trials (2 sessions x 30 trials x 1 probe x 10 images).

During the "guilty" session, subjects are instructed to respond "yes" to the target image and "no" to both the probe image and irrelevant images. Conversely, during the "innocent" session, subjects are directed to respond "yes" to the target image and "no" to the probe and irrelevant images.

The EEG data recorded during this real environment scenario includes artifacts, which necessitates processing. To mitigate high-frequency signals, a band-pass filter is applied. This filter eliminates a specified band of frequencies from the signal without compromising its quality. In this study, the band-pass filter operates within the range of 0.3 Hz to 30 Hz, a frequency range commonly analyzed during mental tasks performed by subjects [13].

Results

In the analysis of the Concealed Information Test (CIT), the EEG data obtained from the acquisition device underwent feature extraction and classification.

The CIT was conducted to investigate human behavior during deception, detailed in section 2. The primary objective was binary classification, categorizing the data into either "guilty" or "innocent" classes. Accurately identifying guilt is pivotal, making proper EEG data classification a significant milestone.

To delve deeper into the data collected during the two sessions, Independent Component Analysis (ICA) was employed on the raw EEG data. Specifically, ICA was applied separately to the data from the guilty and innocent sessions. Infomax ICA, implemented using a MATLAB toolbox, was used for this purpose. Figures 4 and 5 display the component maps generated after applying ICA.

These scalp maps illustrate the activity of components across each channel during the recording sessions.

For subsequent analyses, the EEG data was transformed into numerical attributes utilizing Brain Vision Analyzer 2.1 [9]. To validate the data, 5-fold cross-validation (5-FCV) was utilized on subject-wise EEG data.

Notably, this study involved subject-wise single-trial analysis, enhancing the granularity of the examination. The experiment was conducted utilizing an Intel(R) Core(TM) i7-4790 CPU @ 3.60 GHz paired with 8 GB RAM. MATLAB 2016a was employed for the implementation of the experiment.

In evaluating the classification performance, diverse performance measures such as accuracy, sensitivity, and specificity were utilized, as detailed in reference [14].

These measures are commonly used to assess the effectiveness and efficiency of classification algorithms in handling the EEG data classification for guilt and innocence.

The results, as presented in Table 1, indicate that among the three Hjorth parameters examined, mobility demonstrated superior performance on the recorded EEG data. Specifically, it achieved an impressive accuracy of 96.7% for subject 6.

Brain Vision Analyzer file Fig. 4. Component maps for guilty session Table 1. Performance of KNN using different Hjorth Parameters on EEG based CIT data

Accuracy Sensitivity Specificity

Subje ct Numb er Activit y Compl exity Mobili ty Activit y Compl exity Mobili ty Activit y Compl exity Mobi lity

1 60.0 78.2 80.0 65.0 63.3 73.3 64.0 96.0 88.0

2 56.4 66.0 85.5 60.0 76.0 92.0 80.0 56.0 80.0

3 50.0 73.3 78.3 90.0 93.3 70.0 50.0 53.3 86.7

4 54.0 86.0 94.0 44.0 92.0 96.0 56.0 80.0 92.0

5 50.0 70.0 96.0 68.0 68.0 96.0 52.0 72.0 96.0

6 60.0 36.7 96.7 70.0 50.0 93.3 50.0 23.3 100.0

7 43.3 61.8 70.0 56.7 56.7 66.7 50.0 68.0 73.3

8 55.0 65.0 68.3 80.0 80.0 86.7 56.0 50.0 50.0

9 45.0 56.7 76.7 76.7 86.6 96.7 53.3 26.7 56.7

10 61.7 56.7 73.3 90.0 73.3 80.0 53.3 40.0 66.7

Avera ge 53.5 65.0 81.9 70.0 73.9 85.1 56.5 56.5 78.9

On average, the analysis resulted in an accuracy of 81.9%, with a sensitivity of 85.1% and a specificity of 78.9%.

Comparing these findings with existing approaches, as shown in Table 2, it's evident that employing the k-Nearest Neighbor (k-NN) classifier has enhanced the accuracy for Concealed Information Test (CIT) data. This improvement suggests the efficacy of the k-NN classifier in analyzing the dataset, showcasing promising outcomes compared to other methodologies.

The comparison with a different dataset, employing the same feature extraction approach, was showcased. In reference [6], where the authors applied this feature extraction method to an emotion recognition dataset, an average accuracy of 35.9% was achieved. This comparison highlights the notable disparity in performance between datasets, showcasing the varying effectiveness of the feature extraction approach across different types of data.

Brain Vision Analyzer file Fig. 5. Component maps for innocent session.

Conclusion

In this article, we've introduced a method for detecting deceit by analyzing brain EEG signals. Detecting deceit is difficult because we must ensure that innocent individuals aren't wrongly convicted. Therefore, the accuracy and precision of the system's results are crucial. Our approach involves using Hjorth parameters—activity, mobility, and complexity.

Through subject-specific analysis, we found that the mobility parameter produced the highest accuracy, reaching 96.7% for subject 6. Additional metrics used to assess the classifier's effectiveness include sensitivity, specificity, and the G-measure. This method has shown encouraging outcomes when distinguishing between the guilty and innocent categories in binary classification.

Table 2. Comparison with existing approaches.

Classification Approach Feature Extraction Technique Accuracy Specificity Sensitivity G-measure

KNN [15] Non paprametric LDA 76.8% 70.0% 73.1% 76.5 %

QDA (Emotion dataset) [6] Various (power, wavelet, Hjorth parameters etc.) 35.9 % - - -

LDA [16] EMD 80.1 % 75.7 % 75.7 % 77.8 %

KNN (Proposed) Hjorth parameters 81.9 % 85.1% 78 .9%

REFERENCES

1. Berger, H., 1930. "Ueber das elektrenkephalogramm des menschen." Journal f "ur Psychologie und Neurologie

2. Farwell, L.A., Donchin, E., 1991a. "The truth will out: Interrogative polygraphy (lie detection) with event-related brain potentials". Psychophysiology 28, 531-547.

3. Haider, S.K., Daud, M.I., Jiang, A., Khan, Z., 2017. "Evaluation of p300 based lie detection algorithm". Electrical and Electronic Engineering 7, 69-76.

4. Simbolon, A.I., Turnip, A., Hutahaean, J., Siagian, Y., Irawati, N., 2015. "An experiment of lie detection based eeg-p300 classified by svm algorithm", Automation, Cognitive Science, Optics, Micro Electro-Mechanical System, and Information Technology (ICACOMIT), IEEE. 68-71.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

5. Hjorth, B., 1970. "Eeg analysis based on time domain properties". Electroencephalography and clinical neurophysiology 29, 306-310.

6. Jenke, R., Peer, A., Buss, M., 2014. "Feature extraction and selection for emotion recognition from eeg". IEEE Transactions on Affective Computing 5, 327-339.

7. Altman (1992) "An introduction to kernel and nearest-neighbor nonparametric regression". The American Statistician 46, 175-185.

8. Accessed Date 18-05-2017. easycap. http://www.easycap.de/eZproducts/products.htm15.

9. Accessed Date 18-05-2017. Brain products. URL: http://www.brainproducts.com/.

10. Gao, Lu, Yang, Yu, Na, and Rao (2012) "A novel concealed information test method based on independent component analysis and support vector machine". Clinical EEG and neuroscience 43, 54-63.

11. Abootalebi, Moradi, and Khalilzadeh (2009) "A new approach for eeg feature extraction in p300-based lie detection". Computer methods and programs in biomedicine 94, 48-57.

12. Farwell, and Donchin (1991) "The truth will out: Interrogative polygraphy (lie detection) with event-related brain potentials". Psychophysiology 28, 531-547.

13. Rosenfeld, Soskins, Bosh, and Ryan (2004) "Simple, effective countermeasures to p300-based tests of detection of concealed information". Psychophysiology 41, 205-219.

14. Zhu, Zen, and Wang (2010) "Sensitivity, specificity, accuracy, associated confidence interval and roc analysis with practical sas implementations". NESUG proceedings: health care and life sciences, Baltimore, Maryland 19. .

15. Deng Wang, Duoqian Miao, and Gunnar Blohm. 2013 "A new method for eeg-based concealed information test". IEEE transactions on information forensics and security,8(3),520 527.

16. Abdollah Arasteh, Mohammad Hassan Moradi, and Amin Janghorbani,2016. "A novel method based on empirical mode decomposition for p300-based detection of deception". IEEE Transactions on Information Forensics and Security, 11, 2584-2593

i Надоели баннеры? Вы всегда можете отключить рекламу.