4.2. EEG Emotion Recognition Experiment on the SEED Dataset
- (1)
Subject-Independent Experiment on the SEED Dataset
We investigated five common EEG features to evaluate the proposed emotion recognition method: differential entropy (DE), power spectral density (PSD), differential asymmetry (DASM), rational asymmetry (RASM), and differential caudality (DCAU). These features were extracted from five frequency bands
. To extract the EEG features, the signals were first downsampled to a 200 Hz sampling rate, and the records contaminated by electromyography (EMG) and electrooculography (EOG) artifacts were manually removed. The EOG signals were used to identify blink artifacts, and a bandpass filter between 0.3 and 50 Hz was applied to remove noise. The EEG data for each movie clip were divided into 1 s non-overlapping segments, and features were extracted from each segment. The number of EEG features extracted from each frequency band and a summary of the different feature types can be found in
Table 1. The data input to the model has a dimensionality of (64, 62, 5). In this way, we ensure that the multi-dimensional characteristics of emotional EEG signals are comprehensively captured, providing rich information for emotion recognition.
To ensure the reproducibility of the model, we used the following hyperparameter configuration in the experiments: a batch size of 64 and training epochs 40 during the training phase, with a learning rate of 0.01. The batch size in the testing phase was the same as in the training phase. To guarantee the reproducibility of the results, we set the random seed (seed = 100) and used GPU acceleration for the computation. The optimizer used was Adam, and CrossEntropyLoss was employed as the loss function. The domain adaptation was implemented using a GRL, with the α parameter set to 0.1, which controls the strength of domain adaptation during training. Specifically, the domain classifier was implemented using a fully connected neural network with 100 hidden units, batch normalization, ReLU activations, and a LogSoftmax output. We also conducted experiments with different values of the α parameter and found that selecting other values resulted in a decrease in both the accuracy and standard deviation by approximately 2–3%. All experiments were conducted based on these fixed hyperparameter settings, ensuring that the reported classification accuracy is reliable and can be reproduced under the same configuration.
In the subject-independent experiment, we used the Leave One Subject Out (LOSO) cross-validation method to evaluate the performance of the AttGraph model in the EEG emotion recognition. Specifically, in the LOSO cross-validation, the EEG data from 14 subjects were used to train the model, while the EEG data from the remaining 1 subject were used as the test data. The experiment was repeated to ensure that the EEG data from each subject was used as the test data once. Each epoch took approximately 17 s, and the total number of model parameters was about 77,000. We then calculated the average classification accuracy and standard deviation for the five EEG features, with the results shown in
Table 2.
As shown in
Table 2, there is a significant difference in the contribution of different EEG features to emotion recognition in the AttGraph model. The differential entropy feature was the most effective at capturing the complexity and uncertainty changes in the brain activity, and thus, achieved the highest recognition accuracy. The power spectral density features also performed well, as they effectively captured emotional changes in the frequency domain. In contrast, differential asymmetry and reasonable asymmetry features performed poorly, possibly due to the dynamic nature of emotional states and the strong influence of individual differences. The differential caudality features provided unique insights into the brain region coordination and information flow. Although their accuracies were lower than the first two, they still had certain value. These results suggest that the AttGraph model could flexibly utilize different EEG features, dynamically weighting and selecting key features from EEG signals to highlight those with strong discriminative power in emotion state changes, thereby significantly improving the accuracy and robustness of emotion recognition.
Furthermore, to further demonstrate the performance of the AttGraph model, we compared it with several popular EEG emotion recognition models. The comparison results are shown in
Figure 2.
The AttGraph model performed excellently in the EEG emotion recognition tasks, showing significant improvements compared with traditional methods such as SVM and DGCNN. The accuracy of the SVM was 56.73% with a standard deviation of 16.29%, far below other methods, indicating considerable limitations in the traditional approaches for cross-subject emotion recognition tasks. The TCA achieved an accuracy of 63.64%, with a standard deviation of 14.88%, showing improvement over the SVM but still significantly lower than the AttGraph. Although the DGCNN achieved an accuracy of 79.95%, with a standard deviation of 9.02%, demonstrating the advantages of graph convolution, it still fell short of the AttGraph. In contrast, the AttGraph achieved an accuracy of 85.22%, with a standard deviation of 4.90%, and thus, outperformed both the SVM and DGCNN and offered higher stability. Compared with current state-of-the-art models, the AttGraph’s performance was also very close. For instance, BiDANN and BiHDM achieved accuracies of 85.4% and 85.3%, with standard deviations of 7.53% and 6.72%, respectively, while the RGNN had an accuracy of 84.14% and a standard deviation of 6.87%. These results show that the AttGraph was competitive with the most advanced deep learning methods in terms of both accuracy and stability, demonstrating its strong competitiveness and application potential in emotion recognition tasks.
To comprehensively evaluate the contribution of each module in the AttGraph model, we designed ablation experiments by sequentially removing key modules and comparing their performance. The experimental results are shown in
Table 3. First, to verify the contribution of the multi-dimensional attention convolution module, we replaced it with a traditional GCN. The results showed that after removing this module, the model’s accuracy significantly decreased (from 85.22% to 81.75%) and the standard deviation increased. This indicates that the multi-dimensional attention convolution module played a crucial role in capturing and weighting key features across different channels, and thus, significantly improved the model’s accuracy and robustness. Specifically, in the context of emotion EEG signal processing, it effectively captured the correlations between channels, which enhanced the model’s stability.
Next, we replaced the global attention module with a simple direct summation method. The results show a slight decrease in accuracy (from 85.22% to 83.06%) after removing this module, but the impact was relatively small, indicating that the global attention module played a positive role in optimizing the feature fusion. It helped the model retain more critical feature information, which improved the accuracy and stability of the emotion recognition.
Finally, to validate the effect of the gradient reversal module, we removed this module. The experimental results indicate a slight decrease in the average accuracy (from 85.22% to 83.41%) and a slight increase in the standard deviation. This suggests that although the gradient reversal module was essential for cross-subject generalization, the model could still maintain a good performance even without it, especially when working with a relatively limited subject dataset. In particular, the model still performed well at the emotion recognition with the standardized data.
Through these ablation experiments, we not only verified the effectiveness of each module at the emotion EEG signal recognition but also gained a deeper understanding of the advantages and potential of the AttGraph model. The multi-dimensional attention convolution module played a central role in capturing and weighting the key channel features, and thus, significantly improved the model’s accuracy and robustness. The global attention module optimized the feature fusion, which helped the model retain more important information, although its contribution was not as critical as the multi-dimensional attention convolution module. The gradient reversal module enhanced the model’s generalization ability through cross-subject domain adaptation. Despite its relatively small impact when removed, it still helped to reduce the subject differences. These experimental results show that each module in the AttGraph model played an important role in the emotion EEG signal recognition, which further validated the model’s effectiveness and applicability.
- (2)
Subject-Dependent Experiment on SEED Dataset
In the subject-dependent experiment, each subject’s 15 EEG data recordings from a single session were used, with the first 9 as the training set and the remaining 6 as the test set. As in the subject-independent experiment, we compared the AttGraph model with several popular EEG emotion recognition models. The comparison results are shown in
Figure 3.
As can be seen from the figure, the AttGraph model performed excellently in the EEG emotion recognition tasks. Compared with traditional methods, such as the SVM and DBN, the AttGraph significantly improved the accuracy, where it reached 97.45%, a notable increase compared with the SVM (83.99%) and DBN (86.08%). At the same time, the AttGraph also outperformed models like the DGCNN (90.4%) and BiDANN (92.38%), demonstrating the powerful potential of combining graph convolution and attention mechanisms in emotion recognition. Compared with current state-of-the-art models, like the RGNN (94.24%) and BF-GCN (97.44%), the AttGraph performed remarkably well, with a lower standard deviation (2.20%), showcasing the higher stability and generalization ability. Therefore, the AttGraph demonstrated strong competitiveness in emotion recognition tasks, where it outperformed traditional methods and was on par with the most advanced technologies.
4.3. EEG Emotion Recognition Experiments on the SEED-IV Dataset
- (1)
Subject-Independent Experiment on the SEED-IV Dataset
In this experiment, we explored two common EEG features to evaluate the proposed emotion recognition method: differential entropy (DE) and power spectral density (PSD). The features were extracted from five frequency bands
. The number of EEG features extracted from each frequency band and the summary of different feature types are shown in
Table 4. Similar to the SEED dataset, the data input into the model had a dimensionality of (64, 62, 5).
In the subject-independent experiment, we still used the LOSO method to evaluate the performance of the AttGraph model in EEG emotion recognition, with an approximate runtime of 9 s per epoch, and calculated the average classification accuracy and standard deviation for the two EEG features. The results are shown in
Table 5.
The experimental results of the AttGraph model on the SEED-IV dataset show that the differential entropy feature could effectively capture the complexity and uncertainty in EEG signals and was somewhat related to the changes in emotional states. Although this feature performed well in the emotion classification, its classification accuracy was lower than that on the SEED dataset due to the more complex fluctuations of emotional states. The PSD feature, while reflecting the brain’s electrical activity intensity across different frequency bands and capturing the frequency components in emotional changes, showed a relatively low accuracy in the four-class task. This may have been because in the multi-class emotional state, the discriminative power of the PSD feature was weak, and it could not effectively distinguish the subtle differences between the different emotional states. Next, we compared the AttGraph model with the same models used in the subject-independent experiment of the SEED dataset, and the results are shown in
Table 6.
The experimental results show that the AttGraph model performed exceptionally well in the EEG emotion recognition tasks. Compared with traditional methods, such as the SVM (accuracy of 37.99%) and TCA (accuracy of 56.56%), the AttGraph demonstrated a significant performance improvement, where it achieved an accuracy of 78.36% with a standard deviation of 9.61%, indicating a high accuracy and stability in cross-subject emotion recognition. Compared with methods like the DGCNN (accuracy of 52.82%), the AttGraph further enhanced the recognition performance, and also performed exceptionally well among the state-of-the-art methods, surpassing the BiDANN (accuracy of 65.59%), BiHDM (accuracy of 69.03%), and RGNN (accuracy of 73.84%). These results demonstrate that the AttGraph had strong competitiveness in the emotion recognition tasks, where it effectively overcame the limitations of traditional methods and achieved results that were on par with or surpassed the existing cutting-edge technologies.
To better understand the behavior of the AttGraph model in emotion recognition tasks, we conducted attention visualization experiments. By visualizing the attention weights of each node in the graph convolution network, we analyzed the model’s focus on different electrode nodes under various emotional states.
Figure 4 and
Figure 5 show the importance weights of EEG electrode nodes as processed by the multi-dimensional attention convolution module and the global attention module during the emotion classification task, respectively.
As shown in
Figure 4, the electrodes that exhibited higher attention weights (15, 23, 25, 31, 32, 33, and 50) spatially aligned with the emotion-related cortical areas identified in prior neuroimaging studies [
24], particularly in the prefrontal and temporal regions known to be involved in affective processing. This indicates that the AttGraph could automatically identify key emotional channels, demonstrating its excellent feature learning and spatial perception abilities. Compared with
Figure 4, the weight distribution in
Figure 5 is sparser, suggesting that the model was able to focus its attention on the most relevant electrode channels during the decision stage, and thus, removed redundant information.
Finally,
Figure 6 presents the confusion matrix for the AttGraph model in the multi-class emotion recognition task, showing the model’s classification accuracy and confusion for the four emotional states: neutral (0), sadness (1), fear (2), and happiness (3).
From a physiological perspective, the high recognition rate of neutral emotions (99.08%) may be related to their stable EEG characteristics with minimal emotional fluctuation, making them easier to distinguish. The low recognition rate of happiness (61.14%) was associated with the complexity of its EEG signals, which involved dynamic activation across multiple brain regions (such as the reward system and prefrontal cortex), and significant individual differences, which made it difficult to accurately distinguish the signals. In contrast, the higher recognition rates for sadness and fear (79.92% and 75.27%, respectively) were closely related to the key role of the amygdala, which plays a central role in these negative emotions, which caused the EEG signals to exhibit clear and stable characteristics in the lower frequency bands (such as and waves), which facilitated model recognition. Although sadness and fear share some similarities, their physiological responses differ, which allowed the model to effectively differentiate between these two emotions. Therefore, the AttGraph model aligned well with these physiological behaviors and effectively captured and distinguished EEG patterns associated with different emotional states.
- (2)
Subject-Dependent Experiment on the SEED-IV Dataset
In this experiment, we used the same experimental method and the same EEG emotion recognition models as in the subject-dependent experiment on the SEED dataset. The experimental results are shown in
Table 7.
In the subject-dependent experiments, the AttGraph model demonstrated an outstanding performance, where it significantly surpassed the traditional methods, such as the SVM, DBN, and DGCNN. Specifically, the accuracy of the SVM was 56.61%, with a standard deviation of 20.05%, which was much lower than the other methods, indicating significant limitations of traditional approaches in emotion recognition tasks. The DBN achieved an accuracy of 66.77%, with a standard deviation of 7.38%, showing an improvement over the SVM, but it was still significantly lower than the AttGraph. The DGCNN achieved an accuracy of 69.88%, with a standard deviation of 16.24%, and despite using graph convolution methods, it still could not outperform the AttGraph. In contrast, the AttGraph achieved an accuracy of 93.92%, with a standard deviation of 2.78%, where it performed not only better than the SVM, DBN, and DGCNN but also with a higher stability. Compared with the current state-of-the-art models, the AttGraph also performed exceptionally well. For example, the BF-GCN achieved an accuracy of 89.55%, with a standard deviation of 10.95%, while the RGNN achieved an accuracy of 79.37%, with a standard deviation of 10.54%. These results demonstrate that the AttGraph exhibited strong competitiveness in emotion recognition tasks, particularly in the subject-dependent experiments, where its excellent performance and stability proved the model’s efficiency and potential for application.
To further evaluate the contribution of each module in the AttGraph model to the emotion recognition performance, we conducted an ablation study based on the subject-dependent experiments using the SEED-IV dataset. The experiment involved progressively removing key modules from the model and observing the changes in accuracy to analyze their impact on the overall performance of the model. The experimental results are shown in
Table 8 below.
In the ablation study, we progressively removed key modules from the AttGraph model and observed their impacts on the performance. When the multi-dimensional attention convolution module was removed and replaced with a traditional GCN, the model accuracy dropped to 88.71%, with a standard deviation of 4.61%, indicating that the multi-dimensional attention convolution module is crucial for capturing complex emotional features in EEG signals. After removing the global attention module, the accuracy decreased to 90.68%, with a standard deviation of 3.10%, but the performance still exceeded that of removing the multi-dimensional attention module, suggesting that the global attention module helps focus on the key features of emotional changes. When the gradient reversal module was removed, the accuracy dropped to 91.21%, with a standard deviation of 6.81%. Although the accuracy decreased, the model still maintained good performance, indicating that this module plays a significant role in domain adaptation. The complete AttGraph model achieved an accuracy of 93.92%, with a standard deviation of 2.78%, showing the best performance and stability. Overall, the multi-dimensional attention convolution module and global attention module were critical for improving the model accuracy, while the gradient reversal module contributed to enhancing the model’s domain adaptation capability.