Научная статья на тему 'Electroencephalogram-based emotion recognition using a convolutional neural network'

Electroencephalogram-based emotion recognition using a convolutional neural network Текст научной статьи по специальности «Медицинские технологии»

CC BY
259
38
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
ARTIFICIAL NEURAL NETWORK / CONVOLUTIONAL NETWORKS / DEEP LEARNING / ELECTROENCEPHALOGRAM / EMOTIONAL STATE / MACHINE LEARNING / VALENCE

Аннотация научной статьи по медицинским технологиям, автор научной работы — Savinov V.B., Botman S.A., Sapunov V.V., Petrov V.A., Samusev I.G.

The existing emotion recognition techniques based on the analysis of the tone of voice or facial expressions do not possess sufficient specificity and accuracy These parameters can be significantly improved by employing physiological signals that escape the filters of human consciousness. The aim of this work was to carry out an EEG-based binary classification of emotional valence using a convolutional neural network and to compare its performance to that of a random forest algorithm. A healthy 30-year old male was recruited for the experiment. The experiment included 10 two-hour-long sessions of watching videos that the participant had selected according to his personal preferences. During the sessions, an electroencephalogram was recorded. Then, the signal was cleared of artifacts, segmented and fed to the model. Using a neural network, we were able to achieve a F1 score of 87%, which is significantly higher than the F1 score for a random forest model (67%). The results of our experiment suggest that convolutional neural networks in general and the proposed architecture in particular hold great promise for emotion recognition based on electrophysiological signals. Further refinement of the proposed approach may involve optimization of the network architecture to include more classes of emotions and improvement of the network's generalization capacity when working with a large number of participants.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Electroencephalogram-based emotion recognition using a convolutional neural network»

ELECTROENCEPHALOGRAM-BASED EMOTION RECOGNITION USING A CONVOLUTIONAL NEURAL NETWORK

Savinov VB, Botman SA, Sapunov VV, Petrov VA, Samusev IG, Shusharina NN H Immanuel Kant Baltic Federal University, Kaliningrad, Russia

The existing emotion recognition techniques based on the analysis of the tone of voice or facial expressions do not possess sufficient specificity and accuracy. These parameters can be significantly improved by employing physiological signals that escape the filters of human consciousness. The aim of this work was to carry out an EEG-based binary classification of emotional valence using a convolutional neural network and to compare its performance to that of a random forest algorithm. A healthy 30-year old male was recruited for the experiment. The experiment included 10 two-hour-long sessions of watching videos that the participant had selected according to his personal preferences. During the sessions, an electroencephalogram was recorded. Then, the signal was cleared of artifacts, segmented and fed to the model. Using a neural network, we were able to achieve a F1 score of 87%, which is significantly higher than the F1 score for a random forest model (67%). The results of our experiment suggest that convolutional neural networks in general and the proposed architecture in particular hold great promise for emotion recognition based on electrophysiological signals. Further refinement of the proposed approach may involve optimization of the network architecture to include more classes of emotions and improvement of the network's generalization capacity when working with a large number of participants.

Keywords: machine learning, artificial neural network, electroencephalogram, emotional state, valence, deep learning, convolutional networks

Author contribution: Savinov VB, Botman SA, Sapunov VV Petrov VA — data acquisition and processing, manuscript preparation; Samusev IG — manuscript preparation and revision; Shusharina NN — project supervision and manuscript revision.

Compliance with ethical standards: this study was approved by the Ethics Committee of Immanuel Kant Baltic Federal University (Protocol № 7 dated March 26, 2019). The participants gave informed consent gave written informed consent to participation in the study and publication of his personal data.

[23 Correspondence should be addressed: Natalya N. Shusharina Universitetskaya 2, Kaliningrad, 236006; nnshusharina@gmail.com

Received: 21.03.2019 Accepted: 16.05.2019 Published online: 29.06.2019

DOI: 10.24075/brsmu.2019.037

ОПРЕДЕЛЕНИЕ ЭМОЦИОНАЛЬНОГО СОСТОЯНИЯ СВЕРТОЧНОЙ НЕЙРОННОЙ СЕТЬЮ ПО ДАННЫМ ЭЛЕКТРОЭНЦЕФАЛОГРАФИИ

В. Б. Савинов, С. А. Ботман, В. В. Сапунов, В. А. Петров, И. Г. Самусев, Н. Н. Шушарина^

Балтийский федеральный университет имени Иммануила Канта, Калининград, Россия

Существующие методы определения эмоционального состояния, основанные на регистрации тональности голоса и мимики, не обладают достаточной точностью и специфичностью. Эти показатели можно повысить с помощью анализа биосигналов, которые не проходят через сознательные фильтры, для чего необходимо создание эффективного алгоритма определения эмоционального состояния на основании анализа электрофизиологических сигналов. Целью работы было провести бинарную классификацию валентности эмоционального состояния по данным электроэнцефалографии с использованием сверточной нейронной сети и сравнить эффективность ее работы с эффективностью метода случайного леса. В качестве подопытного был выбран здоровый 30-летний мужчина. В течение 10 сессий по 2 ч каждая с подопытного производили запись электроэнцефалограммы во время просмотра им специально сформированного набора видеофильмов. Полученный сигнал фильтровали, сегментировали и использовали для обучения классификаторов. При использовании сети удалось достичь значения F1-меры, равного 87%, что превышает показатель, полученный при использовании метода случайного леса с входными данными в виде вектора признаков (67%). Достигнутые результаты свидетельствуют о высокой перспективности применения нейронных сетей сверточного типа в общем, и предложенной архитектуры, в частности, для решения задач по распознаванию эмоционального состояния по данным электрофизиологических сигналов. Дальнейшие работы по развитию подхода могут быть направлены на оптимизацию архитектуры сети для расширения числа идентифицируемых классов, а также повышения обобщающей способности сети при работе с большим количеством испытуемых.

Ключевые слова: машинное обучение, искусственные нейронные сети, электроэнцефалограмма, эмоциональное состояние, валентность, глубокое обучение, сверточные сети

Информация о вкладе авторов: В. Б. Савинов, С. А. Ботман, В. В. Сапунов и В. А. Петров — сбор и обработка материала, написание текста статьи; И. Г. Самусев — написание, редактирование текста статьи; Н. Н. Шушарина — руководство и редактирование статьи.

Соблюдение этических стандартов: исследование одобрено Научно-этическим комитетом Балтийского федерального университета имени И. Канта (протокол № 7 от 26 марта 2019 г.). Все участники исследования подписали добровольное информированное согласие на участие в эксперименте и публикацию результатов.

gg Для корреспонденции: Наталья Николаевна Шушарина

ул. Университетская, д. 2, г Калининград, 236006; nnshusharina@gmail.com

Статья получена: 21.03.2019 Статья принята к печати: 16.05.2019 Опубликована онлайн: 29.06.2019 DOI: 10.24075/vrgmu.2019.037

Emotions play a crucial role in our daily lives, affecting our perception, decision-making and social interactions. Emotions can have an outward manifestation, showing in the tone of voice or a facial expression, as well as evoke physiological changes invisible to the naked eye. Although self-assessment studies do yield useful information, there are certain issues with

the reliability and validity of the obtained data [1]. Because both voice and facial expression can be mimicked, they can hardly serve as reliable indicators of a person's emotional state [2]. In contrast, the analysis of physiological signals fosters our understanding of basic emotional responses and the underlying biological mechanisms [3].

The following biosignals are commonly used to analyze a person's emotional state: galvanic skin response (GSR), electromyogram (EMG), heart rate (HR), respiratory rate (RR), and electroencephalogram (EEG). Of them, EEG is the most interesting; it reflects the activity of the cerebral cortex that shapes a number of emotional responses. Although EEG has low spatial resolution, its temporal resolution is quite high, meaning that changes to the phase and frequency of the signal can be conveniently measured following exposure to an external emotional stimulus. Besides, electoencephalography is advantageously noninvasive, fast and inexpensive, in comparison with other techniques for the acquisition of biological data.

As a rule, the EEG-based research into emotion recognition involves classification of a relatively small number of discrete states evoked by a specific stimulus. Usually, a raw EEG signal is passed through a filter first, and then features are extracted from it; classification is performed using a machine learning algorithm. The efficacy of this approach is largely determined by how features, which are the mathematically calculated signal attributes, are constructed and selected. Normally, feature construction accounts for the experimental and theoretical data on brain biology and the processes that an emotional stimulus induces in the brain.

The selected features are used to form feature vectors for machine learning models, such as random forests, multilayer perceptrons, support vector machines, k-nearest neighbors, etc. [4-6]. Classification accuracy of the listed models varies depending on the quality of input data, task criteria and the choice of a learning algorithm. In the case of an EEG signal, classification accuracy can be as high as 77% [7], whereas for multimodal signals it increases up to 83% [8]. The outcome is largely determined by the choice of a model and its parameters, signal features, and the techniques used to reduce data dimensionality.

Convolutional neural networks and deep learning are an alternative to the aforementioned algorithms for EEG-based emotion recognition. Neural networks are successfully used to process electrophysiological signals [9]. At present, the analysis of EEG signals by convolutional networks is employed to solve a variety of medical tasks and aid brain-computer interactions, including seizure prediction [10], detection of P300 waves [11], and recognition of emotions [12, 13] using DEAP datasets [14]. Such wide range of tasks proves that convolutional neural networks are a versatile and robust tool. Deep learning allows the optimal features to be automatically generated during the training process and, therefore, cancels the need for manual feature selection. Still, some authors use externally computed features [15] and Fourier and wavelet transforms [16, 17] as input data for a neural network.

The aim of this work was to carry out an EEG-based binary classification of emotional valence by creating and training a convolutional neural network and to compare its performance to that of a random forest algorithm with explicitly formed feature vectors.

METHODS

One of the authors of this work, a healthy male aged 30 years without a history of psychiatric disorders, volunteered to be an experimental subject. He underwent multiple sessions that involved exposure to different stimuli and lasted a few weeks. The design of our experiment was an adaptation of a well-known method [18]. It is common to recruit more than one participant and conduct a single session with each of the subjects. In our

case, although the data were collected from one subject, the total data amount was quite substantial because the sessions were repeated multiple times. People differ in their emotional response to the same stimulus, and this diversity complicates the identification of physiological patterns that correspond to certain emotional states. When only one participant is engaged in a series of experiments, the interpretability of data improves significantly because the perception style remains unchanged.

The initial set of videos used to elicit a positive or negative emotional state was compiled based on the personal preferences of the study participant. The participant was familiar with the selected videos, therefore his emotional state during the sessions was largely determined by his inner psychic life, memories, etc. and not by the external stimulus as such. Later, the set was expanded to include additional videos that were categorized as emotionally positive or negative based on their resemblance to the original films. Data acquisition took 2 weeks. Ten two-hour long sessions were conducted. During a session that also included breaks, the participant watched six 15-min long videos (two from each category). Each video was played only once. A positive video was always followed by a negative.

EEG signals were recorded using a previously developed neurodevice [19]. Ag/AgCl electrodes with a conductive gel were attached to a special cap for EEG at positions F3, F4, C3, C4, P3, P4, O1, O2 (the 10-20 system), as required by the monopolar recording technique. The electrode at position Fpz was used as a ground and reference. The sampling rate was 250 Hz. Second-order Butterworth filters were applied to cut off 1 and 50 Hz frequencies; additionally, a notch filter was used to remove a 50 Hz noise.

Once filtered, the data were standardized per channel and segmented into the sliding windows of 2 s with a 0.2 s overlap. Then, the resulting dataset was fed to the neural network. A filtered nonstandardized segmented signal was used to calculate features. The selected features included those recommended in the literature [20]: high-order crossings up to the 6-th order, band power for delta, theta, alpha, beta and gamma frequency ranges and power asymmetry ratio.

All models were trained as follows: the EEG data obtained during each of the sessions were assigned to one of the two classes depending on what type of stimulus was used to evoke the signal. The assignment was binary: positive stimuli were assigned to class 1, whereas negative, to class 2. The network was trained using a minibatch stochastic gradient descent method (the learning rate optimization algorithm Adam [21], the minibatch size of 64, the learning rate of 0.001, 30 epochs) and categorical cross-entropy as a loss function. Thus, the input tensor had 3 dimensions [64, 8, 500], where 64 was the minibatch size, 8 was the number of EEG channels, 500 was the length of an EEG segment with an overlap of 50 time points.

All stages of the experiment, including signal recording and processing, as well as model training, involved the use of Python and scikit-learn and Keras libraries.

RESULTS

We have created a neural network with the following architecture: 2 convolutional layers of 64 kernels each, a batch normalization layer, an ELU (exponential linear unit) layer, an average pooling layer (window size 4, stride 1), 2 convolutional layers of 64 kernels each, a batch normalization layer, a ReLU (rectifier linear unit) layer, a max-pooling layer (window size 2, stride 1), 2 convolutional layers of 128 kernels each, a batch normalization layer, a ReLU layer, a max-pooling layer (window size 2, stride 1),

ф

.Q

Я

0.98 0.02

0.47 0.53

0.9

0.8

0.7

0.6 <D

.Q

0.5 Я

<D

0.4

0.3

0.2

0.1

0.79 0.21

0.07 0.93

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Positive Negative Positive Negative

Predicted label Predicted label

Fig. Confusion matrices for the random forest algorithm (left panel) and the convolutional neural network (right panel)

a fully connected layer consisting of 256 neurons with a ReLU activation function. The following parameters were used in all convolutional layers: size 3, stride 1, padding 0. Softmax (normalized exponential function) was used as an activation function for the output layer.

The use of a neural network allowed us to achieve a F1 score of 87% on a validation sample, which is significantly higher than an F1 score for a random forest model (67%). Confusion matrices (see the Figure) demonstrate that the random forest model successfully identifies positive states, outperforming the neural network slightly, but has low specificity to negative states. By contrast, the performance of the deep learning model in identifying and differentiating between the two studied states is equally accurate.

When testing the performance of our convolutional neural network using a dataset cleared of noise (electrooculography, EOG), we did not observe any significant changes to classification accuracy. The analysis of neural network activations demonstrated that EOG artifacts did not significantly skew classification results.

DISCUSSION

Today, neural networks are gradually replacing the well-established approaches to the analysis of electrophysiological

signals Involving manual selection of features and classical models of machine learning. Unsurprisingly, this trend is observed in the field of emotion recognition. One of the major differences between the approach proposed in this paper and its counterparts that also exploit convolutional networks is the use of an input signal without converting it into frequency domain representation (Fourier or wavelet transform).

Given that our model was trained and tested using the data obtained in the course of this particular experiment, we cannot compare its performance to that of its counterparts. However, it definitely outperforms classical models. The use of a neural network cancels the need for manual optimization of feature selection.

CONCLUSIONS

The proposed approach to emotion recognition is based on the use of a convolutional neural network and does not require signal conversion into frequency domain representation. Our model has demonstrated higher accuracy in comparison with a random forest algorithm. Further refinement of the approach will aim to improve its generalization capacity and include more classes of emotions into the algorithm. Effective techniques for emotion recognition could be potentially used for solving practical tasks in the field of psychology and marketing.

References

1. Calvo RA, D'Mello S. Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on affective computing. 2010; 1 (1): 18-37.

2. Jerritta S, Murugappan M, Nagarajan R, Wan K. Physiological signals based human emotion recognition: a review. Signal Processing and its Applications (CSPA), 2011 IEEE 7th International Colloquium on. IEEE. 2011; p. 410-5.

3. Li Q, Yang Z, Liu S, Dai Z, Liu Y. The study of emotion recognition from physiological signals. Advanced Computational Intelligence (ICACI). 2015 Seventh International Conference on. IEEE. 2015; p. 378-82.

4. Soroush MZ, Maghooli K, Setarehdan SK, Nasrabadi AM. Emotion classification through nonlinear EEG analysis using machine learning methods. International Clinical Neuroscience Journal. 2018; 5 (4): 135-49.

5. Liu J, Meng H, Nandi A, Li M. Emotion detection from EEG recordings. The 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). IEEE. 2016; p. 1722-7.

6. Ackermann P, Kohlschein C, Bitsch JA, Wehrle K, Jeschke S. EEG-based automatic emotion recognition: Feature extraction, selection and classification methods. 2016 IEEE 18th international conference on e-health networking, applications and services (Healthcom). IEEE. 2016; p. 1-6.

7. Mehmood RM, Du R, Lee HJ. Optimal feature selection and deep learning ensembles method for emotion recognition from human brain EEG sensors. IEEE Access. 2017; (5): 14797-806.

8. Yin Z, Zhao M, Wang Y, Yang J, Zhang J. Recognition of emotions using multimodal physiological signals and an ensemble deep learning model. Computer methods and programs in biomedicine. 2017; (140): 93-110.

9. Min S, Lee B, Yoon S. Deep learning in bioinformatics. Briefings in bioinformatics. 2017; 18 (5): 851-69.

10. Page A, Shea C, Mohsenin T. Wearable seizure detection using convolutional neural networks with transfer learning. 2016 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE. 2016; p. 1086-9.

11. Cecotti H, Graser A. Convolutional neural networks for P300

detection with application to brain-computer interfaces. IEEE transactions on pattern analysis and machine intelligence. 2011; 33 (3): 433-45.

12. Gao Y, Lee HJ, Mehmood RM. Deep learninig of EEG signals for emotion recognition. 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE. 2015; p. 1-5.

13. Tripathi S, Acharya S, Sharma RD, Mittal S, Bhattacharya S. Using Deep and Convolutional Neural Networks for Accurate Emotion Classification on DEAP Dataset. Twenty-Ninth IAAI Conference. 2017.

14. Koelstra S, Muhl C, Soleymani M, Lee JS, Yazdani A, Ebrahimi T et al. Deap: A database for emotion analysis; using physiological signals. IEEE transactions on affective computing. 2012; 3 (1): 18-31.

15. Li J, Zhang Z, He H. Hierarchical convolutional neural networks for EEG-based emotion recognition. Cognitive Computation. 2018; 10 (2): 368-80.

16. Kwon YH, Shin SB, Kim SD. Electroencephalography based fusion two-dimensional (2D)-convolution neural networks (CNN) model for emotion recognition system. Sensors. 2018; 18 (5): 1383.

17. Yuan L, Cao J. Patients' eeg data analysis via spectrogram image with a convolution neural network. International Conference on Intelligent Decision Technologies. Cham: Springer, 2017; p. 13-21.

18. Picard RW, Vyzas E, Healey J. Toward machine emotional intelligence: Analysis of affective physiological state. IEEE transactions on pattern analysis and machine intelligence. 2001; 23 (10): 1175-91.

19. Shusharina NN, Borchevkin DA, Sapunov VV, Petrov VA, Patrushev MV. A Wireless Portable Platform for Physiological Potentials Registration and Processing: A Unified Approach. Journal of Pharmaceutical Sciences and Research. 2017; 9 (7): 1178.

20. Jenke R, Peer A, Buss M. Feature extraction and selection for emotion recognition from EEG. IEEE Transactions on Affective Computing. 2014; 5 (3): 327-39.

21. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980. [Preprint]. 2014 [cited 2014 Dec 22]. Available from: https://arxiv.org/abs/1412.6980.

Литература

1. Calvo RA, D'Mello S. Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on affective computing. 2010; 1 (1): 18-37.

2. Jerritta S, Murugappan M, Nagarajan R, Wan K. Physiological signals based human emotion recognition: a review. Signal Processing and its Applications (CSPA), 2011 IEEE 7th International Colloquium on. IEEE. 2011; р. 410-5.

3. Li Q, Yang Z, Liu S, Dai Z, Liu Y. The study of emotion recognition from physiological signals. Advanced Computational Intelligence (ICACI). 2015 Seventh International Conference on. IEEE. 2015; p. 378-82.

4. Soroush MZ, Maghooli K, Setarehdan SK, Nasrabadi AM. Emotion classification through nonlinear EEG analysis using machine learning methods. International Clinical Neuroscience Journal. 2018; 5 (4): 135-49.

5. Liu J, Meng H, Nandi A, Li M. Emotion detection from EEG recordings. The 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). IEEE. 2016; p. 1722-7.

6. Ackermann P, Kohlschein C, Bitsch JA, Wehrle K, Jeschke S. EEG-based automatic emotion recognition: Feature extraction, selection and classification methods. 2016 IEEE 18th international conference on e-health networking, applications and services (Healthcom). IEEE. 2016; p. 1-6.

7. Mehmood RM, Du R, Lee HJ. Optimal feature selection and deep learning ensembles method for emotion recognition from human brain EEG sensors. IEEE Access. 2017; (5): 14797-806.

8. Yin Z, Zhao M, Wang Y, Yang J, Zhang J. Recognition of emotions using multimodal physiological signals and an ensemble deep learning model. Computer methods and programs in biomedicine. 2017; (140): 93-110.

9. Min S, Lee B, Yoon S. Deep learning in bioinformatics. Briefings in bioinformatics. 2017; 18 (5): 851-69.

10. Page A, Shea C, Mohsenin T. Wearable seizure detection using convolutional neural networks with transfer learning. 2016 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE. 2016; p. 1086-9.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

11. Cecotti H, Graser A. Convolutional neural networks for P300

detection with application to brain-computer interfaces. IEEE transactions on pattern analysis and machine intelligence. 2011; 33 (3): 433-45.

12. Gao Y, Lee HJ, Mehmood RM. Deep learninig of EEG signals for emotion recognition. 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE. 2015; p. 1-5.

13. Tripathi S, Acharya S, Sharma RD, Mittal S, Bhattacharya S. Using Deep and Convolutional Neural Networks for Accurate Emotion Classification on DEAP Dataset. Twenty-Ninth IAAI Conference. 2017.

14. Koelstra S, Muhl C, Soleymani M, Lee JS, Yazdani A, Ebrahimi T et al. Deap: A database for emotion analysis; using physiological signals. IEEE transactions on affective computing. 2012; 3 (1): 18-31.

15. Li J, Zhang Z, He H. Hierarchical convolutional neural networks for EEG-based emotion recognition. Cognitive Computation. 2018; 10 (2): 368-80.

16. Kwon YH, Shin SB, Kim SD. Electroencephalography based fusion two-dimensional (2D)-convolution neural networks (CNN) model for emotion recognition system. Sensors. 2018; 18 (5): 1383.

17. Yuan L, Cao J. Patients' eeg data analysis via spectrogram image with a convolution neural network. International Conference on Intelligent Decision Technologies. Cham: Springer, 2017; p. 13-21.

18. Picard RW, Vyzas E, Healey J. Toward machine emotional intelligence: Analysis of affective physiological state. IEEE transactions on pattern analysis and machine intelligence. 2001; 23 (10): 1175-91.

19. Shusharina NN, Borchevkin DA, Sapunov VV, Petrov VA, Patrushev MV. A Wireless Portable Platform for Physiological Potentials Registration and Processing: A Unified Approach. Journal of Pharmaceutical Sciences and Research. 2017; 9 (7): 1178.

20. Jenke R, Peer A, Buss M. Feature extraction and selection for emotion recognition from EEG. IEEE Transactions on Affective Computing. 2014; 5 (3): 327-39.

21. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980. [Preprint]. 2014 [cited 2014 Dec 22]. Available from: https://arxiv.org/abs/1412.6980.

i Надоели баннеры? Вы всегда можете отключить рекламу.