A Novel DE-CNN-BiLSTM Multi-Fusion Model for EEG Emotion Recognition

Cui, Fachang; Wang, Ruqing; Ding, Weiwei; Chen, Yao; Huang, Liya

doi:10.3390/math10040582

Open AccessArticle

A Novel DE-CNN-BiLSTM Multi-Fusion Model for EEG Emotion Recognition

by

Fachang Cui

^1,2

,

Ruqing Wang

^1,2,

Weiwei Ding

^1,2,

Yao Chen

^1,2 and

Liya Huang

^1,2,*

¹

College of Electronic and Optical Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210023, China

²

National and Local Joint Engineering Laboratory of RF Integration and Micro-Assembly Technology, Nanjing 210003, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(4), 582; https://doi.org/10.3390/math10040582

Submission received: 29 December 2021 / Revised: 10 February 2022 / Accepted: 11 February 2022 / Published: 13 February 2022

(This article belongs to the Special Issue From Brain Science to Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

As a long-standing research topic in the field of brain–computer interface, emotion recognition still suffers from low recognition accuracy. In this research, we present a novel model named DE-CNN-BiLSTM deeply integrating the complexity of EEG signals, the spatial structure of brain and temporal contexts of emotion formation. Firstly, we extract the complexity properties of the EEG signal by calculating Differential Entropy in different time slices of different frequency bands to obtain 4D feature tensors according to brain location. Subsequently, the 4D tensors are input into the Convolutional Neural Network to learn brain structure and output time sequences; after that Bidirectional Long-Short Term Memory is used to learn past and future information of the time sequences. Compared with the existing emotion recognition models, the new model can decode the EEG signal deeply and extract key emotional features to improve accuracy. The simulation results show the algorithm achieves an average accuracy of 94% for DEAP dataset and 94.82% for SEED dataset, confirming its high accuracy and strong robustness.

Keywords:

emotion recognition; DE; temporal and spatial feature; DE-CNN-BiLSTM

1. Introduction

Emotion is a psychological or physiological reflection of human senses, thoughts, and behaviors. Artificial intelligence robots can recognize emotional information through human facial expression, body movement, and speech content [1]. The two-dimensional Valence–Arousal (VA) coordinate system is often presented to evaluate emotional states. Valence represents the degree of emotional pleasure, and arousal indicates the intensity of the emotion [2]. In recent years, with the rapid development of machine learning, signal processing and the significant improvement in electroencephalogram (EEG) signal acquisition technology, EEG-based emotion recognition is becoming a research focus of the artificial intelligence and biomedicine fields in a new era.

Extracting prominent features related to emotion from EEG signals is essential for achieving satisfactory recognition performance. There are many methods used for feature extraction in existing studies, mainly including frequency domain, time-frequency domain, and non-linear dynamics extraction [3]. Padhmashree et al. [4] employed multivariate variational mode decomposition (MVMD) to extract time–frequency features from multichannel EEG signals. Bhattacharyya et al. [5] derived the Hilbert marginal spectrum based on the Fourier–Bessel Series Expansion based Empirical Wavelet Transform (FBSE-EWT) [6] method to measure temporal complexity by computing Shannon Entropy. Differential Entropy (DE) is an extension of Shannon Entropy on continuous signals and presents the temporal complexity of the emotion-related brain activity easier.

Traditional EEG emotion classification algorithms mainly include Support Vector Machine (SVM) [7], K-Nearest Neighbor (KNN) [8], and Random Forest, etc. However, those algorithms cannot extract deeper emotion features, which may lead to low accuracy in emotion recognition. In recent years, the emotion recognition models based on deep learning have developed rapidly, especially the application of the Convolutional Neural Network (CNN) [9]. Jiang et al. [10] proposed a WT-CNN model, decomposing the signal into multi-subbands containing emotion features through wavelet transform, and then the features were input to the CNN to learn the spatial correlation of the signal, which achieved an accuracy of 80.56%. Zheng et al. [11] extracted the DE features and then input them into a Deep Belief Network (DBN) based on CNN, which achieved recognition accuracy of 86.65%. Considering EEG signals also have dynamic and time-varying characteristics, researchers not only need to learn its spatial position feature but also its time slice information. Long Short-Term Memory (LSTM) is a valid model for processing time sequences. Ozdemir et al. [12] proposed an effective model called the CNN-LSTM model to classify emotion: EEG signals were converted to topologies according to the electrode locations and then trained by CNN, and then LSTM was used to extract the temporal features from the consequent time windows, which achieved 86.13% on Arousal and 90.62% on Valence. Although emotion recognition based on the hybrid of CNN and LSTM has made some progress, there are still some challenges. Actually, emotion not only comes from past information but also has impact on future activities. The unidirectional LSTM only learns past time information during the model training process; the future of the signal sequences cannot be learned. Fully integrating past and future emotional signals is supposed to achieve better recognition performance. Therefore, we introduce Bi-directional Long Short-Term Memory to mine the context information of EEG signals.

In this paper, we propose a novel multi-fusion model, DE-CNN-BiLSTM, for EEG emotion recognition. The 4D feature tensors are constructed according to the location of electrodes by applying DE in different time slices of different frequency bands, which contain the time complexity and the spatial location, as well as the past and the future temporal features of the EEG signal. We conduct some simulations on both DEAP and SEED datasets to verify the effectiveness of the model, which not only has high accuracy of emotion recognition but also owns excellent robustness performance.

2. Methods

The DE-CNN-BiLSTM emotion recognition model proposed in this paper is shown in Figure 1.

The model contains the following parts:

(1): The original EEG signals are decomposed into different frequency bands reflecting different states of brain and divided into different time slices.
(2): We calculate the DE of all slices in different frequency bands, then map them into the brain spatial structure to obtain the 4D tensors.
(3): We utilize CNN to pick up the detailed information of spatial structure and output a one-dimensional vector through the last layer of CNN.
(4): The vectors are input to the Bi-LSTM to complete the prediction of the emotional state based on the past and future information of the time sequences.
(5): The softmax function is used as the classifier of the model to output the recognition results.

2.1. Multi-Band Decomposition and DE Feature Spatial Mapping

The human EEG signals are usually divided into four frequency bands including Theta (4–8 Hz), Alpha (8–13 Hz), Beta (13–30 Hz), and Gamma (30–48 Hz), which reflect the characteristics of different emotional states. Therefore, firstly the frequency decomposition is carried out through a filter and then split into many time slices with a non-overlapping 0.5 s Hanning window in each frequency band.

Originating from information theory, differential entropy can measure the complexity of continuous signals. Assuming that μ and σ² are the mean and variance of the EEG signal x, the DE can be defined as Equation (1) [13,14,15].

\begin{array}{l} H (x) & = - \int_{- \infty}^{+ \infty} \frac{1}{\sqrt{2 π σ_{}^{2}}} e^{\frac{{(x - μ)}^{2}}{2 σ_{}^{2}}} \log (\frac{1}{\sqrt{2 π σ_{}^{2}}} e^{\frac{{(x - μ)}^{2}}{2 σ_{}^{2}}}) d x \\ = \frac{1}{2} \log (2 π e σ_{}^{2}) \end{array}

(1)

We calculate the DE for all time slices in four frequency bands and then get the DE feature matrix, which is expressed as follows:

M_{d}^{n} = [v_{d}^{n} (1), v_{d}^{n} (2), \dots, v_{d}^{n} (t)]

(2)

where n is the number of electrode channels, d represents the frequency band number, and t denotes the number of the time slices.

In order to explore the relationship between the DE features of different time slices in different frequency bands, we map the DE to the brain topology and thus construct the 4D tensor X^h^xwxdxt shown in Figure 2; h and w are the height and width of the topology, and we set h = 9 and w = 9, d is the frequency band number, and t represents the number of time slices.

2.2. Spatial Feature Learning

CNN is originally built based on the principle of biological vision and perception. It consists of three parts: the input layer, the hidden layer, and the output layer [16,17,18,19,20,21]. The CNN part of the proposed model imitates the VGGnet model [22] and is shown in Figure 3. The 4D tensors are input to the CNN to learn the spatial feature, which contains three Convolutional Layers, a Max-Pooling Layer, and a Fully Connected (FC) Layer. Especially, the first Convolutional Layer (Conv1) has 64 filters, and the filter size is 3 × 3. The next two Convolutional Layers (Conv2, Conv3) have 128 and 256 filters, and the two filter sizes are both 3 × 3. For all Convolutional Layers, apply the same padding and Rectified Linear Unit (ReLU) activation function, and the calculation rule of the output size of the picture after the Convolution Layers is defined by Equation (3). After the convolution operation, a Max-Pooling Layer with the filter size of 2 × 2 is applied to compress the amount of data and parameters to relieve overfitting. Finally, the output of the pooling layer is flattened to 1296 units and fed to a Fully Connected Layer. The final output Q^t (1 × 512) is the representation of the spatial feature of the original EEG slices.

Output_size = (Input_size − Kernel_size + 2 × Padding)/Stride + 1

(3)

When Convolutional Layers use the same padding, which is equal to 1, the stride of them is 1. When the Max-Pooling Layer uses the valid padding, which is equal to 0, the stride of it is 2.

2.3. Temporal Feature Learning

Long Short-Term Memory (LSTM) is a chain model used to process time sequences; the unique advantage is to use memory cells to replace hidden layer nodes, and it effectively solves the problem of gradient vanishing and gradient explosion. After inputting continuous time sequences into the model, it learns the temporal information of the EEG signal. The weight between the hidden layer and the output layer of LSTM can be recycled at any time [23], and it has a strong memory ability when the information sequence is very long. An LSTM unit is composed of three gate control units: forget gate, input gate, and output gate, and the calculation formulas are defined by Equations (4)–(9).

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f}),

(4)

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}),

(5)

{\tilde{C}}_{t} = \tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{c}),

(6)

C_{t} = f_{t} \times C_{t - 1} + i_{t} \times {\tilde{C}}_{t},

(7)

O_{t} = σ (W_{O} \cdot [h_{t - 1}, x_{t}] + b_{O}),

(8)

h_t = tanh (C_t) × O_t,

(9)

where x_t represents the input time sequences, σ is the sigmoid function, the W terms are the weight matrix, and b terms are the bias vectors of the corresponding weights. The forget gate f_t determines the retention of the feature. The information of the previous state and the current are input into the sigmoid function at the same time. The input gate i_t is responsible for updating the state of the LSTM unit. C_t represents the cell state. h_t is the hidden output of the backward layer. The output gate O_t controls the output values into the next LSTM unit [24].

Compared with the above unidirectional LSTM, the Bi-LSTM network adds a backward layer to learn the future emotion information, which is the extension of the past. Bi-LSTM combines bidirectional characteristics and gating architecture perfectly, which can memorize and process more information through two LSTM units [25]. The structure of the Bi-LSTM network is shown in Figure 4. Time sequences x_i are input to the model. The forward network connects the feature information from the past sequence to the present, and the backward network can connect the information from the future sequence to the present [26], and finally the predicted values y_i will be output by Equation (10).

y_{i} = σ (W_{h} \cdot [h_{t}, h_{t}^{'}] + b_{h}),

(10)

3. Simulation and Result Analysis

In this part, we apply two public EEG datasets, DEAP and SEED, to validate the effectiveness of our proposed DE-CNN-BiLSTM model for emotion recognition. Firstly, we give a detailed description of the datasets. Then we introduce the DE feature and analyze the model property. Finally, we compare with other emotion recognition models to demonstrate the better performance of our model.

3.1. Experimental Datasets

DEAP [27] is a multi-channel EEG dataset of 32 healthy subjects recorded by the research institutes of Queen Mary University of London, etc. Each subject is required to wear an EEG cap to collect the EEG emotional signals stimulated by 40 music video clips and fill out the SAM psychological scale afterwards. The electrode position is distributed according to the international 10–20 system [18], using 32 channels. Each trial of DEAP contains 63 s EEG signals. The first 3 s are the baseline signals, where subjects are in a silent state, and the rest of the 60 s are the emotionally evoked state. There are four dimensions of emotional states: Valence, Arousal, Dominance, and Liking. Using the numbers 1 to 9 to measure the intensity of the state, as in Equation 11, we set the threshold value to 5, where above 5 is judged as High Valence (HV) and below 5 is judged as Low Valence (LV), and we set the label −1 for LV and +1 for HV.

Emotion label = {\begin{matrix} 1 \leq V a l e n c e \leq 5; L V \\ 5 < V a l e n c e \leq 9; H V \end{matrix}

(11)

SEED is a public dataset for studying emotion recognition collected by Bao Liang Lu’s team [14] at Shanghai Jiao Tong University including 15 subjects (7 males, 8 females). The 62 channels refer to the standard international 10–20 system. Fifteen clips are selected from six movies, and the subjects need to watch video materials to induce different emotions. Each clip of SEED contains 4 min, and subjects are provided a 15 s rest before each clip. Every subject performs the experiment a total of three times and with an interval of approximately one week between each time. There are three types of emotions in SEED: Positive, Neutral, and Negative. We set −1 as negative state, 0 for neutral, and +1 for positive [28].

3.2. DE Feature Analysis

We studied the characteristics of emotional changes by calculating the DE in different frequency bands. As is shown in Figure 5, the horizontal and vertical axes represent time and frequency bands, respectively. In Figure 5a, one of the subjects, S01, is used to analyze the generation mechanism of positive emotion; theta and beta bands in the frontal lobe, alpha band in the occipital lobe, and beta and gamma bands in the lateral temporal lobe have higher activating degrees. The activation capability of alpha and beta bands is greater than that of theta and gamma bands.

In Figure 5b, similarly, the subject S21 who watched music video is used to present the negative emotion, from which we notice that the left and right temporal lobes and occipital lobe of four frequency bands are activated with visual stimulation. The emotional intensity is evoked to the maximum at 50 s. The topology changes in the temporal lobe are obvious in the theta and alpha bands, and the changes in occipital lobe are evident in the beta and gamma bands.

In view of the topology of the emotion in both positive and negative, it can be concluded that the activation areas of the four frequency bands under two states are different; the positive is frontal lobe, occipital lobe, and temporal lobe, but the negative is temporal lobe and occipital lobe. The similarity between the two is that the emotion fluctuates over time, which is consistent with the time-varying trajectory of the DE.

3.3. Result Analysis

The DE features are calculated as the inputs of the DE-CNN-BiLSTM model in the DEAP dataset. Our DE-CNN-BiLSTM model is trained with a learning rate of 0.001. Using the Adam optimization algorithm, each iteration is 64 batches. The training progress of the proposed model in terms of the training and validation accuracy on Valence is shown in Figure 6. Then, the classification outputs are performed by softmax. The model is prevented from over-fitting through 5-fold cross validation, and the execution time of model is 4254.1 s. Figure 7a is a box plot depicted the accuracy distribution of emotion recognition for 32 subjects, with an average accuracy of 94.86% on Arousal and 94.02% on Valence, the median is 95.90% on Arousal and 94.80% on Valence, the highest classification accuracy even reaches to 99.13% on Arousal, three red circles represent that the accuracy of three subjects is abnormal.

The SEED dataset has a total of 20,364 samples for one subject [29]. We calculate the DE of the sample signals and input the values arrayed as the 4D map to the proposed model, as is shown in Figure 7b, the accuracy rate of 15 subjects is all above 90%, the median is 95.46%, and there is an average accuracy rate of 94.82%.

The above results show that complexity, spatial features, and temporal features of EEG signals are important factors for emotion recognition. The DE-CNN-BiLSTM model makes full use of these features to obtain better recognition performance on two public datasets, which indicates the good generalization and robustness of it.

To verify the effectiveness of the proposed model, we compare it with the existing models applied to the DEAP dataset on the binary classification of Arousal and Valence. As shown in Table 1, there are some traditional machine learning models, such as Support Vector Machine (SVM) [30] and Artificial Neural Network (ANN) [31]. The accuracy of our model has improved more than 20%, which indicates the deep learning network can extract more detailed emotion features and spatial structure of the signals. We also investigate some advanced deep learning models, such as the Bi-directional Long and Short-Term Memory Network, Convolutional Neural Network, and the related fusion model CNN-LSTM and 2DCNN-BiGRU. Compared with the single Bi-LSTM [32] and CNN [33,34] model, our method has a substantial improvement, which shows our hybrid model not only extracts the spatial feature, but also extracts the dynamic temporal feature of EEG signals, and the DE feature is also an important factor. Compared with the 2DCNN-BiGRU [35], the accuracy of our model has improved approximately 7% on Arousal and 5% on Valence, which is because the input of our model is a 4D structure feature, containing the complexity feature, spatial feature, and temporal feature. Meanwhile, compared with the CNN-LSTM [20], our method has improved 4.62% on Arousal and 4.57% on Valence, which shows that our method can excavate the past and future information of signal sequences.

4. Conclusions

In this paper, we proposed a new EEG signal emotion recognition model named DE-CNN-BiLSTM, fully taking the complexity and spatial structure of the brain and the temporal characteristics of dynamic EEG signals into account. The DE feature was used to decode the detailed emotion features of the brain. The 4D spatial–temporal features based on DE were fed into the DE-CNN-BiLSTM model. The average accuracy rate reached 94.86% on Arousal and 94.02% on Valence of the DEAP dataset, and 94.82% of the SEED dataset, which was 4% more than the existing emotion recognition models. The model also has the advantages of good robustness and generalization, which has important significance for future research on emotion recognition systems based on brain–computer interfaces. In the future step of research, we will try to implement another deep learning network to replace the CNN to explore the deeper spatial feature of the electrodes, such as Graph Convolution Network (GCN) [36].

Author Contributions

Investigation and methodology, F.C., L.H. and R.W.; software and validation, F.C.; formal analysis and data curation, R.W. and Y.C.; writing, F.C.; visualization, W.D.; funding acquisition, L.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Natural Science Foundation of China (grant number 61977039) and 2019 Research Project of University Education Informatization (2019JSETKT009).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Yang, H.; Han, J.; Min, K. A Multi-Column CNN Model for Emotion Recognition from EEG Signals. J. Sens. 2019, 19, 4736. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Russell, J.A. A circumplex model of affect. J. Personal. Soc. Psychol. 1980, 39, 1161. [Google Scholar] [CrossRef]
Gao, Q.; Wang, C.-H.; Wang, Z.; Song, X.-L.; Dong, E.-Z.; Song, Y. EEG based emotion recognition using fusion feature extraction method. J. Multimed. Tools Appl. 2020, 79, 27057–27074. [Google Scholar] [CrossRef]
Padhmashree, V.; Bhattacharyya, A. Human emotion recognition based on time–frequency analysis of multivariate EEG signal. Knowl.-Based Syst. 2022, 238, 107867. [Google Scholar]
Bhattacharyya, A.; Tripathy, R.K.; Garg, L.; Pachori, R.B. A novel multivariate-multiscale approach for computing EEG spectral and temporal complexity for human emotion recognition. IEEE Sens. J. 2020, 21, 3579–3591. [Google Scholar] [CrossRef]
Bhattacharyya, A.; Singh, L.; Pachori, R.B. Fourier–Bessel series expansion based empirical wavelet transform for analysis of non-stationary signals. Digit. Signal Process. 2018, 78, 185–196. [Google Scholar] [CrossRef]
Fang, J.; Wang, T.; Li, C.; Hu, X.; Ngai, E.; Seet, B.-C.; Cheng, J.; Guo, Y.; Jiang, X. Depression Prevalence in Postgraduate Students and Its Association With Gait Abnormality. IEEE Access 2019, 7, 174425–174437. [Google Scholar] [CrossRef]
Sharma, L.D.; Bhattacharyya, A. A computerized approach for automatic human emotion recognition using sliding mode singular spectrum analysis. IEEE Sens. J. 2021, 21, 26931–26940. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, J.; Tan, J.H.; Chen, Y.; Chen, Y.; Li, D.; Yang, L.; Su, J.; Huang, X.; Che, W. An Investigation of Deep Learning Models for EEG-Based Emotion Recognition. Front. Neurosci. 2020, 14, 2759. [Google Scholar] [CrossRef]
Jiang, H.; Wu, D.; Jiao, R.; Wang, Z. Analytical Comparison of Two Emotion Classification Models Based on Convolutional Neural Networks. Complex 2021, 2021, 6625141. [Google Scholar] [CrossRef]
Zheng, W.L.; Lu, B.L. Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks. IEEE Trans. Auton. Ment. Dev. 2015, 7, 162–175. [Google Scholar] [CrossRef]
Ozdemir, M.A.; Degirmenci, M.; Izci, E.; Akan, A. EEG-based emotion recognition with deep convolutional neural networks. Biomed. Eng. Biomed. Tech. 2021, 66, 43–57. [Google Scholar] [CrossRef]
Zhang, Q.; Ding, J.; Kong, W.; Liu, Y.; Wang, Q.; Jiang, T. Epilepsy prediction through optimized multidimensional sample entropy and Bi-LSTM. J. Biomed. Signal Process. Control 2021, 64, 102293. [Google Scholar] [CrossRef]
Li, Y.; Wong, C.M.; Zheng, Y.; Wan, F.; Mak, P.U.; Pun, S.H.; I Vai, M. EEG-based emotion recognition under convolutional neural network with differential entropy feature maps. In Proceedings of the 2019 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), Tianjin, China, 14–16 June 2019; pp. 1–5. [Google Scholar]
Topic, A.; Russo, M. Emotion recognition based on EEG feature maps through deep learning network. J. Eng. Sci. Technol. Int. 2021, 24, 1442–1454. [Google Scholar] [CrossRef]
Alarcao, S.M.; Fonseca, M.J. Emotions recognition using EEG signals: A. survey. J. IEEE Trans. Affect. Comput. 2017, 10, 374–393. [Google Scholar] [CrossRef]
Peters, J.M.; Taquet, M.; Vega, C.; Jeste, S.S.; Fernández, I.S.; Tan, J.; A Nelson, C.; Sahin, M.; Warfield, S.K. Brain functional networks in syndromic and non-syndromic autism: A graph theoretical study of EEG connectivity. J. BMC Med. 2013, 11, 54. [Google Scholar] [CrossRef]
Bhavsar, R.; Sun, Y.; Helian, N.; Davey, N.; Mayor, D.; Steffert, T. The Correlation between EEG Signals as Measured in Different Positions on Scalp Varying with Distance. J. Procedia Comput. Sci. 2018, 123, 92–97. [Google Scholar] [CrossRef]
Hwang, S.; Hong, K.; Son, G.; Byun, H. Learning CNN features from DE features for EEG-based emotion recognition. J. Pattern Anal. Appl. 2020, 23, 1323–1335. [Google Scholar] [CrossRef]
Yang, Y.; Wu, Q.; Fu, Y.; Chen, X. Continuous convolutional neural network with 3d input for eeg-based emotion recognition. In Proceedings of the International Conference on Neural Information Processing, Siem Reap, Cambodia, 13–16 December 2018; Springer: Cham, Switzerland, 2018; pp. 433–443. [Google Scholar]
Shen, F.; Dai, G.; Lin, G.; Zhang, J. EEG-based emotion recognition using 4D convolutional recurrent neural network. J. Cogn. Neurodyn. 2020, 14, 815–828. [Google Scholar] [CrossRef]
Day, M.J.; Horzinek, M.C.; Schultz, R.D. Compiled by the vaccination guidelines group (VGG) of the world small animal veterinary association (WSAVA). J. Small Anim. Pract. 2007, 48, 528. [Google Scholar] [CrossRef]
Chen, X.; He, J.; Wu, X.; Yan, W.; Wei, W. Sleep staging by bidirectional long short-term memory convolution neural network. J. Future Gener. Comput. Syst. 2020, 109, 188–196. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. J. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef] [PubMed]
Graves, A.; Mohamed, A.; Hinton, G. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 6645–6649. [Google Scholar]
Zheng, X.; Chen, W. An Attention-based Bi-LSTM Method for Visual Object Classification via EEG. J. Biomed. Signal Process. Control 2021, 63, 102174. [Google Scholar] [CrossRef]
Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.-S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. Deap: A database for emotion analysis; using physiological signals. J. IEEE Trans. Affect. Comput. 2011, 3, 18–31. [Google Scholar] [CrossRef] [Green Version]
Huang, D.; Chen, S.; Liu, C.; Zheng, L.; Tian, Z.; Jiang, D. Differences First in Asymmetric Brain: A Bi-hemisphere Discrepancy Convolutional Neural Network for EEG Emotion Recognition. J. Neurocomput. 2021, 448, 140–151. [Google Scholar] [CrossRef]
Wang, X.W.; Nie, D.; Lu, B.L. Emotional state classification from EEG data using machine learning approach. J. Neurocomput. 2014, 129, 94–106. [Google Scholar] [CrossRef]
Thammasan, N.; Moriyama, K.; Fukui, K.-I.; Numao, M. Familiarity effects in EEG-based emotion recognition. J. Brain Inform. 2017, 4, 39–50. [Google Scholar] [CrossRef]
Mert, A.; Akan, A. Emotion recognition from EEG signals by using multivariate empirical mode decomposition. J. Pattern Anal. Appl. 2018, 21, 81–89. [Google Scholar] [CrossRef]
Joshi, V.M.; Ghongade, R.B. EEG based emotion detection using fourth order spectral moment and deep learning. J. Biomed. Signal Process. Control 2021, 68, 102755. [Google Scholar] [CrossRef]
Li, J.; Zhang, Z.; He, H. Hierarchical convolutional neural networks for EEG-based emotion recognition. J. Cogn. Comput. 2018, 10, 368–380. [Google Scholar] [CrossRef]
Yea-Hoon, K.; Sae-Byuk, S.; Shin-Dug, K. Electroencephalography Based Fusion Two-Dimensional (2D)-Convolution Neural Networks (CNN) Model for Emotion Recognition System. J. Sens. 2018, 18, 1383. [Google Scholar]
Zhu, Y.; Zhong, Q. Differential Entropy Feature Signal Extraction Based on Activation Mode and Its Recognition in Convolutional Gated Recurrent Unit Network. Front. Phys. 2021, 8, 9620. [Google Scholar] [CrossRef]
Chang, Q.; Li, C.; Tian, Q.; Bo, Q.; Zhang, J.; Xiong, Y.; Wang, C. Classification of First-Episode Schizophrenia, Chronic Schizophrenia and Healthy Control Based on Brain Network of Mismatch Negativity by Graph Neural Network. IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 1784–1794. [Google Scholar] [CrossRef]

Figure 1. The framework of the proposed EEG-based emotion recognition model DE-CNN-BiLSTM.

Figure 2. The spatial mapping of the DE features in four frequency bands.

Figure 3. The spatial structure distribution of CNN model.

Figure 4. The structure of the Bi-LSTM.

Figure 5. (a)Topology changes in the DE at 10–60 s slices in four frequency bands at positive emotion. (b)Topology changes in the DE at 10–60 s slices in four frequency bands at negative emotion.

Figure 6. Training progress of the model in terms of training and validation accuracy for the emotional dimension of Valence.

Figure 7. (a) Distribution of the emotion recognition accuracy of DEAP dataset on Valence and Arousal. (b) Distribution of the emotion recognition accuracy of SEED dataset.

Table 1. Comparison of results of different emotion recognition models.

Dataset	Models	Feature Information	Accuracy(%)
Dataset	Models	Feature Information	Arousal	Valence
DEAP dataset	SVM [30]	PSD	73.30	72.50
	ANN [31]	MEMD	75.00	72.87
	Bi-LSTM [32]	LF-D_fE	76.00	75.50
	CNN [33]	Wavelet Transform	78.12	81.25
	CNN [34]	DE	88.20	86.20
	2DCNN-BiGRU [35]	DE	87.89	88.69
	CNN-LSTM [20]	DE	90.24	89.45
	DE-CNN-BiLSTM(ours)	DE	94.86	94.02

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cui, F.; Wang, R.; Ding, W.; Chen, Y.; Huang, L. A Novel DE-CNN-BiLSTM Multi-Fusion Model for EEG Emotion Recognition. Mathematics 2022, 10, 582. https://doi.org/10.3390/math10040582

AMA Style

Cui F, Wang R, Ding W, Chen Y, Huang L. A Novel DE-CNN-BiLSTM Multi-Fusion Model for EEG Emotion Recognition. Mathematics. 2022; 10(4):582. https://doi.org/10.3390/math10040582

Chicago/Turabian Style

Cui, Fachang, Ruqing Wang, Weiwei Ding, Yao Chen, and Liya Huang. 2022. "A Novel DE-CNN-BiLSTM Multi-Fusion Model for EEG Emotion Recognition" Mathematics 10, no. 4: 582. https://doi.org/10.3390/math10040582

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel DE-CNN-BiLSTM Multi-Fusion Model for EEG Emotion Recognition

Abstract

1. Introduction

2. Methods

2.1. Multi-Band Decomposition and DE Feature Spatial Mapping

2.2. Spatial Feature Learning

2.3. Temporal Feature Learning

3. Simulation and Result Analysis

3.1. Experimental Datasets

3.2. DE Feature Analysis

3.3. Result Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI