Next Article in Journal
Thermal Properties of Ultrasound-Extracted Okra Mucilage
Previous Article in Journal
The Effects of 5 km Interval Running on the Anterior Cruciate Ligament Strain and Biomechanical Characteristic of the Knee Joint: Simulation and Principal Component Analysis
Previous Article in Special Issue
Towards Automatic Detection of Social Anxiety Disorder via Gaze Interaction
 
 
Article
Peer-Review Record

Cascaded Convolutional Recurrent Neural Networks for EEG Emotion Recognition Based on Temporal–Frequency–Spatial Features

Appl. Sci. 2023, 13(11), 6761; https://doi.org/10.3390/app13116761
by Yuan Luo 1,2, Changbo Wu 1,2,* and Caiyun Lv 1
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2023, 13(11), 6761; https://doi.org/10.3390/app13116761
Submission received: 21 April 2023 / Revised: 25 May 2023 / Accepted: 26 May 2023 / Published: 2 June 2023
(This article belongs to the Special Issue Artificial Intelligence (AI) Applied to Computational Psychology)

Round 1

Reviewer 1 Report

The paper proposed a method to improve emotion recognition accuracy in human-computer interaction using the DEAP dataset. The proposed method uses spatial-temporal-frequency features, a Bi-LSTM algorithm for extracting temporal information, and a CNN algorithm for spatial-frequency attributes. Classification is made with a hybrid deep learning algorithm to distinguish valence, arousal, dominance, and liking and the four quadrants of the valence-arousal space.

Obtained results are satisfactory, and the application structure has elements of novelty.

Author Response

Dear Reviewer, Thank you very much for reviewing my paper during your busy schedule. I have made corrections to your valuable comments on the paper in the appropriate places. In response to your comments, I have made further improvements to the article.

Reviewer 2 Report

Authors should add full words for acronyms used in the article

The results obtained need to be verified, how the method of dividing the data into train/test sets is used. What is the result obtained?

The author should add references in some positions using theory such as table 1, Figure 4, ...

The author should add a full model of the designed neural network to better clarify the novelty of the research

The writing language in the article is not suitable for the research field

Author Response

Dear Reviewer, Thank you very much for reviewing my paper during your busy schedule. I have made corrections to your valuable comments on the paper in the appropriate places.

 

Point 1: Authors should add full words for acronyms used in the article.

 

Response 1: In response to your request to add full names to the abbreviations used in the article, I have made the change in the appropriate place in the article.

 

Point 2: The results obtained need to be verified, how the method of dividing the data into train/test sets is used. What is the result obtained?

 

Response 2: The model evaluation method for all experiments in this article uses 10-fold crossing validation, where the original data is divided equally into 10 copies, one of which is selected each time as the test set and the other 9 copies as the training set. The average of the 10 times results is taken as the final accuracy. All experimental results are based on this model evaluation method.

 

Point 3: The author should add references in some positions using theory such as table 1, Figure 4, ...

 

Response 3: In response to your suggestion to add references in the appropriate places, I have made changes in the article accordingly.

 

Point 4: The author should add a full model of the designed neural network to better clarify the novelty of the research.

 

Response 4: The full model architecture I have added at the beginning of Chapter 3 to illustrate the methodology presented in this paper.

 

Dear reviewers, I have made extensive improvements to the language used in the writing of the full text.

Reviewer 3 Report

The present paper is devoted to emotion recognition based on computational intelligence and cognitive psychology.  I have the following specific comments to the present submission:

1.     ABSTRACT AND KEYWORDS: The text should include more details related to the experimental dataset and the proposed methodology. Keywords should include more specific items without abbreviations.

2. Section 1. INTRODUCTION and 2 RELATED WORK: I suggest including here more detailed notes related to cognitive science, general methods of digital signal processing, and machine learning. Additional relevant references should be added, including for instance:  

 [1] Prochazka A., Vysata O, Marik V: Integrating the Role of Computational Intelligence and Digital Signal Processing in Education, IEEE Signal Processing Magazine, 38(3): 154-162, 2021

[2] Gannouni S., Aledaily A., Belwafi K., Aboalsamh H.:  Emotion detection using electroencephalography signals and a zero-time windowing-based epoch estimation and relevant electrode identification, Scientific Reports, 11, 7071, 2021

3. Section 3. METHODS: Datasets should be better described. Why the frequency downsampling was applied and how the digital filtering process was done? How long datasets were? Were they recorded by authors? Which EEG channels were used?

 4.     Fig. 1 and associated text: It should be specified how long segments were. Why no overlapping was applied? Was some kind of preprocessing (including digital filtering) applied? The time domain to frequency domain transform should be added.

 5.     Table 1 and associated text: It is not clear which electrodes were used for extraction of features. Were mental activities recorded by all electrodes in the same intensity?

 6.     Section 3.3 NETWORK ARCHITECTURE: Mathematical description should be more precise. Which software tools were used for data processing? And which own methodology was proposed?

 7.     Section 5 CONCLUSIONS: I suggest to specify into more details processing goals.  Further research should be better specified.

8.  All abbreviations should be used before their first use (including SNR, DWT, CV, …)

 

 

The formal level of the whole text should be increased (no intend after Eqs. (1), (2), ... ), fonts in some figures should be enlarged (in Fig. 1, …), and mistakes in English corrected.

Author Response

Dear Reviewer, Thank you very much for reviewing my paper during your busy schedule. I have made corrections to your valuable comments on the paper in the appropriate places.

 

Point 1: ABSTRACT AND KEYWORDS: The text should include more details related to the experimental dataset and the proposed methodology. Keywords should include more specific items without abbreviations.

 

Response 1: In response to your request for more information to be added to the abstract, I have made changes to the article in the appropriate places.

 

Point 2: Section 1. INTRODUCTION and 2 RELATED WORK: I suggest including here more detailed notes related to cognitive science, general methods of digital signal processing, and machine learning. Additional relevant references should be added.

 

Response 2: In response to your suggestion to add references in the appropriate places, I have made changes in the article accordingly.

 

Point 3: Section 3. METHODS: Datasets should be better described. Why the frequency downsampling was applied and how the digital filtering process was done? How long datasets were? Were they recorded by authors? Which EEG channels were used?

 

Response 3: The dataset is described in more detail in this paper in section 4.1. The processed dataset contains the data of 40 channels, including 32 EEG signals, 2 EOG signals (1 horizontal EOG signal, 1 vertical EOG signal), 2 EMG signals, 1 GSR signal, 1 respiratory band signal, 1 plethysmography, and 1 temperature recording signal. The DEAP data set is initially sampled using 512 HZ, as more than 256 HZ is required to process the EMG and EOG data. In this paper, we do not consider the emotion recognition of multimodal data and downsample the data to 128HZ for computational reasons. The digital filtering process in the pre-processed version of the data set is implemented by a 4-45HZ bandpass filter.The entire dataset consisted of 1280 samples (32 subjects x 40 experiments), with a total of 63 seconds (3 seconds baseline + 60 seconds video stimulation) of EEG signals recorded from subjects per sample. The signal acquisition was recorded by the experimentalists. All 32 channels of signals were used for experimental feature extraction.

 

Point 4: Fig. 1 and associated text: It should be specified how long segments were. Why no overlapping was applied? Was some kind of preprocessing (including digital filtering) applied? The time domain to frequency domain transform should be added.

 

Response 4: In this paper, 5-second duration is used to divide the window, and the frame length is 0.5 seconds. Overlapping causes reuse of data, which can be computationally intensive, and in the latter context a Bi-LSTM is used, which requires the order of the input data to be maintained, and overlapping can cause temporal intersections between adjacent features, which can make the model less accurate. In Figure 1 and associated text, in this paper, in order to expand the number of samples, a 5-second window segmentation is performed on the original samples, and then the FIR filter is used to perform a sub-band operation on the window segmentation data, after which the PSD and DE features will be extracted from the sub-band signals. The detailed procedure for time-to-frequency conversion is described in section 3.2.

 

Point 5: Table 1 and associated text: It is not clear which electrodes were used for extraction of features. Were mental activities recorded by all electrodes in the same intensity?

 

Response 5: All 32 channels of EEG signals acquired in the DEAP dataset were used to construct a 9 × 9 feature matrix. The regions of the brain can be divided into five areas: frontal, temporal, parietal, occipital and central, each responsible for a different physiological function. The frontal lobe is the main area of the brain that performs higher functions and is responsible for generating thoughts and emotions; the prefrontal area of the frontal lobe is particularly prominent when emotional mechanisms are triggered; the parietal lobe is mainly responsible for the perception of stress, pain and other stimuli, and when the human brain is stimulated, emotions change and the posterior part of the parietal lobe becomes active; the occipital and temporal lobes are the higher centres that process vision and hearing respectively, and have some cognitive, emotional and psychological. The central area is mainly responsible for the integration of spatial information from different regions and is where the most spatial information processing activity takes place. 32 electrodes are located in these five regions, so the intensity of the mental activity of the signals picked up by different electrodes at different times is different.

 

Point 6: Section 3.3 NETWORK ARCHITECTURE: Mathematical description should be more precise. Which software tools were used for data processing? And which own methodology was proposed?

 

Response 6: The software tools used in the data processing are explained in detail in 4.2, specifically for Cuda 11.2, Pytorch 1.14. The full model architecture I have added at the beginning of Chapter 3 to illustrate the methodology presented in this paper.

 

Point 7: Section 5 CONCLUSIONS: I suggest to specify into more details processing goals.  Further research should be better specified.

 

Response 7: In response to your suggestions for the conclusion section, I have amended the conclusion section accordingly.

 

Point 8: All abbreviations should be used before their first use (including SNR, DWT, CV, …)

 

Response 8: In response to your comments on the abbreviations, I have made changes in the appropriate places throughout the text.

 

Dear reviewers, I have made extensive improvements to the language used in the writing of the full text.

Round 2

Reviewer 2 Report

The paper Cascaded Convolutional Recurrent Neural Networks for EEG Emotion Recognition Based on Temporal-Frequency-Spatial Features is accepted in my opinions
In the future, the author should apply the method to some modern datasets about the emotion like Amigos dataset, i.e

The writing skill needs to be more in the science way. You should used the Grammarly or some services for editing the language in the article 

Author Response

Dear Reviewer
Thank you very much for your comments on my manuscript. I have referred the content of the article to a professional body for language touch-ups and grammar checking. I have also carefully reviewed the content. In the future, I will take your advice and conduct experiments on some modern datasets to verify the validity of the model proposed in this paper.

Author Response File: Author Response.docx

Reviewer 3 Report

Most comments were answered and I suggest to correct several formal mistakes only (no indentation of the text after equations, ...).

Minor improvements are necessary.

Author Response

Dear Reviewers
Thank you very much for your comments on my manuscript. I have referred the content of the article to a professional agency for linguistic touch-ups and grammatical checks. And, I have also carefully reviewed the content. The formal errors in the article have been corrected in the revised version.

Author Response File: Author Response.docx

Back to TopTop