This section comprehensively presents the experimental evaluation of the proposed model on a self-constructed dataset, including the experimental setup, the experiments conducted, and the analysis of the results. Additionally, it reveals the activation of brain regions and their collaborative patterns during states of musical recall and creation.
3.1. Implementation Details
In EEG recognition tasks, experiments are commonly conducted in either a subject-dependent or subject-independent manner. Given the highly subjective nature of music recall and creation, which can be influenced by individual personality, education level, experience, and other factors, our model adopts a subject-dependent training approach [
31]. For this experiment, we train a separate model for each participant and employ ten-fold cross-validation to determine each subject’s accuracy. The final recognition rate of the model is the average classification accuracy across all subjects.
The initial learning rate of the model is set at . When the recognition rate reaches 70%, it is reduced to and further adjusted to upon reaching an 85% accuracy rate. Each subject’s model undergoes 50 iterative updates with a batch size of 32. This batch size can be adjusted based on the training conditions of individual subjects, with the majority achieving optimal results at batch sizes of 32 or 64. In addition to using the overall recognition rate as an evaluation criterion, this study also analyses the recognition rate for each state and the contribution rate of each frequency band, aiming to comprehensively evaluate the classification performance and effectiveness of our model from multiple perspectives.
3.2. Performance Comparison with Related Works
This study compares the proposed method with established approaches from public datasets and evaluates its performance using a specially designed dataset, with detailed results presented in
Table 4. The model exhibits a substantial improvement in the accurate differentiation of cognitive states elicited by external musical stimuli and those arising from internally recalled creative processes across both happy and sad emotional music contexts. For untrained participants, accuracy improved by 1.55% for cognitive states related to happy music and by 4.69% for sad music. Participants with a musical background showed improvements of 1.39% for happy music-related states and 2.96% for sad music-related states. Significantly, the model achieved considerable performance gains in cognitive states associated with sad music, a scenario where most models typically exhibit lower recognition rates. These findings highlight the effectiveness of our approach, which prioritizes the extraction of information from highly active brain regions in response to specific emotional music stimuli and selectively disregards less pertinent data, thereby facilitating more effective processing of complex cognitive activities.
A confusion matrix is employed to create the other metrics frequently from the fundamental metrics (true positive/TP, true negative/TN, false positive/FP, and false negative/FN). The number of correct predictions to the total number of predictions yielding the quality metric is called classification accuracy (ACC). However, ACC is not accepted as an adequate parameter to determine the proposed algorithm’s performance totally. Hence, the other statistical performance metrics are described, including precision and specificity in this respect. The notations are given below.
Based on these metrics, we recalculated the performance of each method, as shown in the
Table 5 below.
To precisely analyze the performance of this model in differentiating various states compared to the best-performing model in the group, SFCSAN. Confusion matrices and ROC curves are established, and the detailed results are shown in
Figure 7. Although both models exhibit similar performance in recognizing music stimulation states, the proposed model achieves an 8% improvement in distinguishing between music recall and creation states. The results of the ROC curve also show the superiority of our method. This result highlights the model’s advantage in extracting distinctive and relevant information associated with music recall and creation states. Further analysis of this information could reveal potential mechanisms of brain activity during the music creation process.
3.3. Visualization of the SSAM Network
The proposed model excels at extracting discriminative information, specifically targeting brain regions rich in music-related content. Electrodes associated with these regions are posited to contain a greater abundance of music-related information, which is instrumental for correlating brain electrical activity with music. Analysis of the attention map from the SSAM attention network involved setting a threshold and visualization, yielding the following results in
Figure 8.
The attention map shows that the brain regions most associated with music recall and creation are predominantly on the left side, particularly concentrated in the left prefrontal lobe (FP1 and FPZ), left temporal lobe (T7), and occipital lobe (O1 and OZ). When comparing subjects without musical training to professionally trained subjects, the left temporal lobes (M1 and C3) in the latter group exhibited heightened activation. Moreover, additional activation was observed in the right occipital lobe (O2 and P8). The behavior of these regions likely stems from the shaping influence of music on the brain. Elucidation of these areas is hoped to offer valuable insights and support for EEG music generation.
Figure 9 shows the contributions of each of the four frequency band outputs from the ECA module. Individuals without musical training predominantly utilize EEG information in the alpha band. In contrast, those with musical training have more information related specifically to music creation in the gamma band. The contributions from the beta and theta bands did not significantly differ. A comparison of the differences in recognition rates between the two datasets suggested that the high-frequency gamma-band contains more relevant information than the low-frequency alpha band for music-related complex cognitive activities, indicating the need for greater emphasis on information within this frequency range.
3.4. Activation Analysis of Brain Regions in Music-Related Cognitive Activities
The focus of this section is the activation analysis of brain regions during music-related cognitive activities, specifically examining whether the methods typically used in emotion recognition are applicable to more complex cognitive tasks such as music recall and creation. In emotion recognition studies, researchers often employ a time-slicing method with overlapping EEG windows, typically ranging from 1 to 3 s [
40,
41,
42], to segment EEG signals. This approach not only expands the sample size but also allows deep learning algorithms to extract richer information, achieving commendable results in emotion recognition studies. However, the suitability of this method for more complex cognitive activities [
43,
44], such as those involved in musical creativity, requires further experimental validation. The present study selected individuals with the highest recognition rates from two groups of subjects. After preprocessing the original EEG signal, a two-second window was used for calculating the feature matrix sequence. Compared to a one-second window, the data from the two-second window were smoother and less influenced by transient stimuli. The correlation of these data with the SSAM matrix generated by the proposed model was calculated, setting a threshold of 0.6 to identify relevant information. Locations with a correlation coefficient below this threshold were considered to contain less information pertinent to the music recall and creation states.
The heat map presented in
Figure 10 illustrates that, regardless of the emotional context, the characteristic patterns of brain activity associated with the states of music recall and creation are not continuously active but exhibit intermittent activation. This suggests that the brain regions involved in processing music-related tasks are not persistently active but are instead subject to frequent activations over brief intervals. This phenomenon is more pronounced in participants without musical training. In contrast, individuals with a musical background, particularly those with high recognition rates, exhibit shorter activation intervals and stronger activation intensity during music recall and creation states, suggesting that musical training may enhance brain activity and creativity.
Furthermore, the heatmap results suggest that while the time-slicing technique is effective at capturing emotion-related information, it may have limitations in identifying the subtle nuances of more complex cognitive activities, such as music composition. This finding underscores the advantage of the proposed method in capturing relevant information across different time frames in complex cognitive tasks.
3.5. Network Connectivity Analysis in Music-Related Cognitive Activities
Using the SSAM network, the electrodes most closely associated with music recall and creation states were identified. Additionally, this paper further investigated how other brain regions work in tandem with these identified electrode areas. To this end, this study explored the correlation of temporal feature sequences among different electrodes, seeking to discern collaborative patterns within these regions. The heatmap matrices depicted in
Figure 11 and
Figure 12 showcase the working synergy between the brain areas corresponding to these electrodes. Circle plots based on set thresholds further reveal the functional connections that are most relevant to these key areas. The results are presented in
Figure 13.
The analysis indicates a significant correlation between electrodes closely related to music recall and creation and other electrodes. The heatmap matrix reveals that individuals trained in music exhibit brain activation levels significantly greater than those reflected by the SSAM network, even in specific regions that may not contain much state-specific information.
During happy emotions, individuals without musical training predominantly display synchronized brain activity in the frontal (FP1 and FP2), vertex (CZ), left occipital (O1, O2, and POZ), right temporal (F4 and C4), and right occipital (O2) regions, with the central brain region showing less obvious activation in higher frequency bands. In contrast, those with musical training show more intense synergistic activity in the left frontal (FP1 and F3) and left temporal (CP5 and P3) regions, vertex area (CP2), and right temporal area (F4 and T8), demonstrating pronounced collaboration with key brain regions. The activation in the parietal and temporal regions of these individuals is notably extensive.
As shown in
Figure 14,
Figure 15 and
Figure 16, in sad emotional states, both professionally trained and untrained participants exhibit noticeable activation in the prefrontal and right occipital cortices.
Participants without musical training demonstrate a relatively simple brain cooperation pattern, with only a few electrodes showing significant correlations with surrounding regions. Notably, the left frontal (FP1 and F3) and occipital (O1, O2, and OZ) regions, as well as the vertex (FZ and FC1) and right temporal areas (P8 and M2), exhibit strong synergistic activity. In contrast, individuals with musical training exhibit a more complex pattern of brain cooperation. While the correlations between electrodes are not as strong as those between electrodes of untrained participants, the involved areas are more extensive. The frontal (FP2), occipital (POZ and O2), and temporal (M1 and T8) regions show signs of collaboration. This finding suggests that musical training may reshape brain structures. Untrained individuals rely on specific brain regions to process music information, where these regions are closely interconnected. Conversely, musical training expands the brain’s processing areas, enabling different regions to handle various musical elements, thereby increasing processing efficiency. This pattern is confirmed by the higher recognition rates observed in the dataset, further emphasizing the plasticity of the brain and the profound impact of music on it. These findings could help elucidate the complex relationship between the brain and music, particularly in understanding the potential mechanisms of how electroencephalography can be transformed into musical compositions [
45].
In light of the relatively small sample size, we performed a rigorous significance analysis to ensure the robustness of our findings, applying a
p < 0.05 threshold for statistical relevance. The outcomes of this analysis are displayed across two tables:
Table 6 and
Table 7. Electrodes exhibiting significant correlations under various emotional music stimuli were predominantly localized in the left frontal (FP1 and FPZ) and occipital regions (O1, OZ, and O2). Notably,
Table 6 reveals that in untrained participants, electrodes with higher significance were concentrated in the α band, whereas
Table 7 indicates that, in participants with musical training, significant electrode activity was primarily observed in the γ band. These patterns suggest a differential neural processing response to musical training, highlighting potential areas of interest for further cognitive and neurophysiological research.