Due to the dynamics property and noise presented in an EEG signal, EEG-based emotion recognition has always been a challenging task. Many researchers have employed a wide range of methods to tackle this problem. In this section, we introduce the works that mainly utilize machine learning and deep learning approaches.
2.1. Machine Learning Approach
Machine learning is a widely used approach in EEG emotion recognition. It often starts with preprocessing the raw signal and extracting hand-crafted features. Then, features are fed into a machine learning model, such as a support vector machine, K-nearest neighbor, decision tree, etc., to classify emotion states.
Many studies have been carried out that focus on evaluating the effectiveness of different features. Ref. [
17] explores power spectral density, differential asymmetry and rational asymmetry of the paired channels under multiple frequency bands. These features are processed by a support vector machine to recognize emotions. It finds that differential asymmetry is more robust to detect the brain dynamics caused by emotions. Moreover, information provided by the channels from the frontal and parietal lobe is useful to distinguish emotions. Ref. [
18] conducts studies on emotion classification with different features as input. During the process, feature dimensionality reduction techniques, such as principal component analysis and linear discriminant analysis, are adopted to improve the efficiency and accuracy. The experiment results indicate that the power spectrum was identified as the most effective amongst all input features and the high frequency band tends to be more useful in emotion classification. These studies show that the choice of input features can largely affect the results.
To compare which classifiers have the best performance, Ref. [
6] utilizes statistical data, i.e., min, max, mean and standard deviation, as the input. Then, it adopts a K-nearest neighbor, regression tree, Bayesian network, support vector machine and artificial neural network for classification. The experiments show that the K-nearest neighbor and support vector machine give the best results among all the models. However, it can be challenging for the majority of machine learning methods to work well with large datasets. Ref. [
19] employs discrete wavelet transform and spectral features. In the classification stage, it applies a support vector machine with the aid of a radial basis function kernel to process features from 10 channels to do the classification. Ref. [
20] employs empirical mode decomposition/intrinsic mode functions and variational mode decomposition to process the raw EEG signal which is widely used in biomedical studies. These methods are used to decompose nonlinear and non-static signals and feed them into VMD to identify low and high frequencies. Then, it extracts two non-linear features: entropy and Higuchi’s fractal dimension. Finally, it carries out experiments by using Naive Bayes, K-nearest neighbor, decision tree and convolutional neural networks to recognize emotions. A common observation from these studies is that the support vector machine often generates the best outcomes in emotion classification tasks.
2.2. Deep Learning Approach
Recently, extensive research efforts have been devoted to deep learning techniques for EEG-based emotion identification due to the robustness and low requirement for prior knowledge. These techniques can be generally classified according to the type of network used, as those with similar architectures are prone to follow analogous ideas.
In previous studies, a common class of deep learning model to address EEG-based emotion classification is long-short term memory (LSTM), which is typically designed to capture temporal dependencies with data sequences. Ref. [
21] directly inputs EEG signal into the LSTM network by treating channels as features for each time frame. Similarly, Ref. [
7] computes the discrete wavelet transform from the raw signal, followed by the extraction of statistical data. These extracted features are then fed into a network architecture that combines LSTM layers with dense layers for each individual channel. Ref. [
15] further extends LSTM with a domain adversarial neural network. It involves the extraction of features from each hemisphere using an LSTM-based approach. The domain adversarial network is adopted here to address the challenge of cross-subject variability. These studies demonstrate the strength of LSTM in effectively capturing temporal characteristics from EEG data. A potential drawback of LSTM is that it may hinder the ability to learn the spatial connections among EEG channels.
Another type of network widely adopted in emotion recognition with EEG signal is a convolutional neural network (CNN), which uses a shared-weight kernel to slide over data. It is primarily utilized in the area of image analysis due to its advantages for processing data with grid patterns. Ref. [
22] examines the power of CNN in terms of architecture, design and training decisions. The results indicate that CNN is capable of learning highly discriminative features when given the proper conditions. Ref. [
23] adopts a 3D convolution layer, which is able to learn spatial and temporal features simultaneously. It requires a 3D input representation for the EEG signal by appending consecutive frames together. Ref. [
24] develops a compact convolution architecture for EEG-based brain–computer interfaces (BCI). It introduces separable and depthwise convolution, which can not only give extract interpretable features but also reduces the number of parameters. Ref. [
14] uses multi-scale convolutional layers to extract temporal and spatial layers. It specifically considers the asymmetrical property in the frontal area of brain. These studies demonstrate that CNN is capable of processing both temporal and spatial aspects of EEG signals. However, an issue that often comes with CNN is the inflexibility when considering the relationships among channels or areas. As the nature of CNN is to presume the grid pattern of input data, it is challenging for CNN to investigate non-Euclidean connectivity. On the other hand, this problem can be handled by a graph-based approach.
A graph neural network (GNN) is a class of networks that presents data in a graph structure. In EEG signal tasks, a graph is often constructed by treating each channel as a node, while the formulation of edges could vary. One of the options is to utilize the spatial proximity. Ref. [
25] builds a 2D matrix to mark the relative position of electrodes. Then, the adjacency matrix is obtained by thresholding the shortest distance between a node and its neighbors. Ref. [
26] establishes the connectivity based on the inverse square function of the physical distance. However, argued by [
27], these spatial-based formulations may not represent the real functional connection between channels. To address this problem, it proposes a dynamical graph convolutions neural network that can dynamically learn the intrinsic relationship between nodes. In those works that employ GNN, the advantage of exploring topological structure makes it more adaptable to investigate the relationship between channels.
To improve the performance on both the spatial and temporal level, hybird networks are used that are composed of different types of networks. From the perspective of signal decomposition, Ref. [
28] proposes a model that derives the source signal by stack autoencoder (SAE). Next, the sequenced features are fed into the LSTM network to learn the contextual correlation. Ref. [
29] proposes a model that first captures spatial features by convolution layers at each timestamp and feeds them into the LSTM layer. The novelty in this work is that it adopts an attention mechanism in both stages to capture which channel or which timestamp contributes more in the process of emotion recognition. Ref. [
30] employs a combination of GNN and LSTM, where GNN is responsible for learning static graph-domain features and LSTM extracts effective information from the channel-level relationships in a short range of time. Recently, the study of spatial-temporal graph learning has also been employed in EEG emotion classification. Ref. [
31] integrates the spatial graph convolutional network with an attention-enhanced bi-directional LSTM module. This type of model better combines the temporal information to learn the features.
In addition to the aforementioned approach, some other novel methods have emerged in the field of EEG-based emotion classification. The methods provide different directions for advancing the deep learning techniques on EEG emotion classification. One of these methods, Ref. [
32], focuses on a real-time method, which employs online learning techniques, including adaptive random forest, streaming random patches and logistic regression. Ref. [
33] utilizes a capsule network to extract hierarchical features from the EEG signal, where each emotional capsule associates with an individual task. To enhance the power of multi-task learning, it uses the dynamic routing algorithm to achieve information exchange between primary capsules and emotional capsules. Recently, reinforcement learning has gained attention in EEG emotion classification as well. An example is [
34], which is a reinforcement learning-based method that combines the idea of Papez circuit theory and uses EEG signals from the frontal lobe to simulate brain mechanisms. The key contribution in this approach is the utilization of a double dueling deep Q network, which enhances the decision-making process with more informed choices. These various methods have significantly advanced the field of deep learning for EEG emotion classification.