1. Introduction
A brain-computer interface (BCI) is a computer-based system that collects, examines, and converts brain signals into instructions that are communicated to an output device to perform a requested response. Brain impulses can now be used to operate devices, owing to advancements in this field [
1]. Electroencephalography (EEG) is the most utilized brain signal because it is measured from the scalp (non-invasive), is low cost, and has a high time resolution [
2]. Due to the non-stationary nature of EEG signals, their increased susceptibility to artifacts, and their frequent exposure to external noise, processing them is a tough task. Additionally, the subject’s posture and attitude can affect the EEG readings [
3].
The electrical activity of the brain recorded from the scalp is the EEG signal, which is made up of several underlying base frequencies. Specific emotional, cognitive, or attentional states are indicated by these frequencies. A frequency range of 0–35 Hz was used in most of the research [
4].
This study concentrated on EEG signals derived from motor imagery (MI), the process of imagining limb movement. When a subject imagines moving the right or left hand, or both, or the right or left foot, or any of the five fingers, or the tongue, or any other limb in the human body, MI data are generated. Researchers demonstrated in the early 2000s that the most effective strategy for detecting EEG-based MI was to employ common spatial patterns (CSP). The purpose of the CSP algorithm is to identify a set of linear transformations, frequently referred to as spatial filters, that optimize distance over several classes. The motor imagery of the right hand, left hand, and feet that were recorded during an MI-EEG task are classified into these classes. The data representation is constructed using the relative energy of the filtered channels after the spatial filters have been estimated. For high accuracy, these multi-dimensional EEG data representation can be simply entered into a linear classifier, such as a support vector machine (SVM) [
5].
MI-designated EEG as a growing area of interest in the field of BCI is associated with not only enormous potential but also vital applications (e.g., gaming [
6], robotics [
7,
8], and therapeutic applications [
9,
10]). There are, however, significant limitations in terms of data collecting and categorization techniques. The objective of this research is to develop an end-to-end classification model based on deep learning that is capable of reliably categorizing MI-EEG-based signals with high kappa values, which is a measure of how much agreement can be anticipated by chance. Despite deep learning’s growing popularity in a variety of fields, it has yet to produce satisfying results when used to classify EEG signal-based motor imagery. The high dimensionality of EEG data (multichannel and sampling rate), the presence of artifacts (such as motion), noise, and channel correlation make the design of an optimum EEG classification model using deep learning (DL) difficult.
According to preliminary observations, the main difficulty with EEG MI classification is that it is a more subject-specific task. This means that each person has unique traits that aid the system in correctly classifying the MI movement. This issue can be addressed through the use of multi-scale, multi-branch, or parallel architectures, which increase the model’s generality. However, this type of model is typically computationally expensive, requiring a larger number of parameters and a longer training period. As a result, we present in this paper a DL-based EEG MI classification model that is lightweight and capable of dealing with subject-specific tasks using fixed hyperparameters, making it more suitable for use in real-world applications. The following are the primary contributions of the paper:
Build an end-to-end multi-branch EEG MI classification model based on DL that can solve the subject-specific problem.
Develop a lightweight multi-branch attention model that can accurately classify EEG MI signals with a small number of parameters.
Create a robust general model with fixed hyperparameters.
Using multiple datasets, test the usefulness and robustness of the proposed model against data fluctuations.
In
Section 2, we provide a summary of related research publications on MI-EEG classification algorithms.
Section 3 presents the proposed model, multi-branch EEGNet with squeeze-and-excitation block (MBEEGSE), while
Section 4 and
Section 5 contain a discussion of the experimental data and results, and a conclusion, respectively.
2. Related Works
With just one processing block, deep learning can complete the whole feature extraction, selection, and classification pipeline. Convolutional neural networks (CNNs) [
11,
12,
13,
14] are the most frequently used architecture in MI EEG processing, but other architectures like recurrent neural networks (RNNs) [
12,
15], deep belief networks (DBNs) [
12], and stacked autoencoders (SAEs) [
13] have been utilized as well. Due to the nonlinear and non-stationary nature of EEG MI signals, CNN has an advantage over other deep learning techniques. They possess temporal and spatial features as a result of the time spent visualizing the movement and the simultaneous acquisition of data from several electrodes, each electrode has different locations that contain the spatial information. For that, CNN provides several advantages for analyzing MI EEG data, including high accuracy on large datasets, the ability to exploit the hierarchical nature of particular signals, and the ability to learn both temporal and spatial information concurrently.
Numerous studies used data preparation procedures before feeding information into a CNN. ConvNet [
16], which uses convolutional layers to extract temporal and spatial information and was inspired by the filter-bank CSP (FBCSP) [
17], was the first interesting technique that used raw EEG data. Two comparable MI topologies were introduced in [
18]: the ShallowConvNet, which is a shallow convolutional network with two convolutional layers and classification layers, and the DeepConvNet, which is a deep design with additional aggregating layers. The EEGNet was proposed in [
19] as a compact version of previous approaches. It is based on depth-wise convolution and separable convolution, which minimizes the network’s parameter count. Following that, similar structures were proposed, one of which was published by Riyad et al. in [
20]. The first half of the model is identical to EEGNet, with the second half containing an inception block. To improve the performance of EEGNet, the researchers applied temporal convolutional networks (TCNs) in [
21]. All of these architectures address the shortcomings in EEGNet, such as its shallow and compact design, which restricts network capacity and, in most situations, leads to overfitting. Due to a degradation issue, performance remains low even with a deeper network. As a result, it is recommended to utilize a multibranch model that incorporates attributes from different branches.
In [
22], Amin et al. combined multilayer CNNs with two separate feature fusion techniques: a multilayer perceptron (MLP) and autoencoders to produce a new approach to EEG signal classification. The authors examined different levels of CNNs to extract the most significant features, and then combined them before classification to improve the accuracy of EEG-based MI. Their models were trained on the high gamma dataset (HGD) to avoid overfitting. In [
23], the same researcher presented an attention-based inception model that contains two attention blocks. Each attention block comprises three parallel convolutions with varying filter sizes, followed by an attention vector that fuses all of the features collected from the convolution process. As demonstrated in [
24], a 3D CNN is used in EEG-based MI because it improves classification in image/video processing applications. In [
24], Zhao et al. proposed a multi 3D CNN for preserving spatial and temporal properties. They depicted 3D EEG as a sequence of 2D arrays based on the electrode placements, then extended the array to a 3D array using the temporal information from the EEG.
We noticed that no previous research had been done on raw MI-EEG signals as input for 2D CNNs with a multi-branch. In [
24,
25], the authors used a multi-branch architecture with 3D CNN, with a 3D EEG signal as the input and a 3D filter applied. In comparison to 3D filters, we believe that utilizing a 2D CNN and applying two 1D filters, one along time and one along with space, will reduce computational complexity and improve the model’s ability to deal with subject-specific difficulty. According to researchers in [
26], flattened networks, which use only one-dimensional filters to cover all three dimensions in 3D, perform as well as, or better than, conventional convolutional networks while using far less processing. The 3D filter is more difficult to implement in real-time applications than the 1D filter.
A multi-branch model’s fundamental concept is that the raw or prepared input is routed through multiple subnetworks, each with its own set of characteristics. The authors of [
27] developed a CP-MixedNet architecture that used multiscale EEG features extracted from a series of convolution layers, each of which captures EEG temporal information at various scales. In [
28] the authors propose a parallel spatial-temporal representation of raw EEG signals that makes use of the self-attention process to generate separate spatial-temporal features. To encode spatial correlations between MI EEG channels, they exploited the spatial self-attention module in particular. Additionally, the temporal self-attention module transforms global temporal information into sample time step characteristics, enabling time-domain extraction of high-level temporal aspects in MI EEG data. The authors of [
29] divided the original signal into three band-limited signals by filtering it across separate band ranges. They varied the size of the temporal convolutional filter in each band range, resulting in nine parallel branches, three for each filter band. This resulted in a massive number of parameters totaling over 1215 K for the entire system and 405 K for a single filter band. As a result of this limitation, the system’s application in a wide variety of applications is limited. Furthermore, because the filter size did not change, the method did not account for the impact of shifting neighborhoods in channels.
The authors proposed a more advanced method in [
30]. It is a temporal-spectral-based squeeze-and-excitation feature fusion network (TS-SEFFNet). In a cascade architecture, the deep-temporal convolution block (DT-Conv block) is the first section of their model, which employs convolutions to extract high-dimension temporal representations from raw EEG data. The multispectral convolution block (MS-Conv block) is then run in parallel using multilayer wavelet convolutions to capture discriminative spectral information from matching sub-bands. The final recommended block was the squeeze-and-excitation feature fusion block (SE-Feature-Fusion block), which was used to fuse deep-temporal and multispectral data into comprehensive fused feature maps. Interdependencies between different domain characteristics are introduced, bringing channel-specific feature responses to the forefront. It is a sizable model with numerous parameters (282 K).
In [
31], a hybrid of the multi-scale and an attention mechanism was presented. The authors built a multi-scale fusion convolutional neural network based on the attention process (MS-AMF). To maintain as much information flowing as possible, the network captures spatiotemporal multi-scale characteristics from multi-brain area representation signals and applies a dense fusion mechanism. The network’s sensitivity was increased by the attention method they used, which consisted of Squeeze-and-Excitation (SE). However, before the data are entered into the model, this model includes a part for data preparation. Jia et al. [
32] suggested an end-to-end approach for decoding raw EEG signals that do not include any pre-processing or filtering or Multibranch Multi-scale Convolutional Neural Network (MMCNN). It is a huge model with several branches at each scale, which increases its complexity and results in a high number of parameters. It is composed of five parallel branches that each contain an EEG Inception block, a residual block, and an SE.
Our suggested model, in contrast to existing multibranch, multiscale, and parallel networks, takes advantage of the essential element of multibranch with a kernel size fluctuation to improve classification accuracy while maintaining a low level of complexity and a limited number of parameters.
4. Results and Discussion
The mental and physical states of research subjects can vary substantially in EEG-MI studies. To accomplish this, we classified the data in this study using the within-subject technique. To put it another way, the model is trained and tested using data from multiple sessions recorded for the same person [
22]. The proposed model is employed in this study to apply the within-subject technique to both the BCI-IV2a and the HGD datasets. One session is utilized for training and the other is used for testing both datasets. Global parameters are used for all individuals in the proposed model for both datasets, as indicated in
Table 1. We previously examined the optimal hyperparameters for the EEGNet blocks in [
38]. During the training phase, a callback is used to save the best model weights based on the current best accuracy, and the best-saved model is then loaded during the test phase. With a batch size of 64 and a learning rate of 0.0009, the model is trained for 1000 epochs. For the cost function, a cross-entropy error function was constructed and an Adam optimizer was used. All experiments were done in Google’s Colab environment making use of the Tensorflow deep learning library and the Keras API.
4.1. Overall Comparison
Using the aforementioned BCI-IV2a and HGD datasets, the performance of the recommended strategy is compared to that of open-source end-to-end models and alternative multibranch methods.
FBCSP is a handcrafted model for classifying motor imagery EEG data that are often used as a baseline method [
17]. It won several EEG decoding competitions, including the BCI competition IV in both datasets 2a and 2b. The CSP features are retrieved from different frequency bands in this model before being classified using the SVM [
17].
ShallowConvNet is a deep learning network that can categorize MI-EEG with only two convolution layers and a mean pooling layer [
11].
DeepConvNet is a deeper deep learning model than ShallowConvNet. It consists of four convolution and max-pooling layer blocks, followed by a softmax layer [
11].
EEGNet is a deep learning model that uses two-dimensional temporal convolution, depthwise convolution, and separable convolution to achieve a consistent approach to various BCI tasks [
19].
CP-MixedNet is a multi-scale model that extracts EEG features from many convolution layers, each of which captures EEG temporal information at different scales [
27].
TS-SEFFNet is a multi-block system that employs attention and fusion techniques. The spatio-temporal block, the deep-temporal convolution block, the multi-spectral convolution block, the squeeze-and-excitation feature fusion block, and the classification block are all part of a larger model [
30].
CNN + BiLSTM (fixed) is a hybrid deep learning model which contains an attention-based inception model and the LSTM model. It was tested and analyzed with fixed hyperparameter values, which were fixed for all subjects [
15].
We also compared our findings to earlier research [
38], which included lightweight multibranch models without attention blocks, Multi-branch EEGNet (MBEEGNet), and Multi-branch ShallowConvNet (MBShallowConvNet). As seen in
Table 2, the attention block improves accuracy by about 1%.
Table 2 summarizes the classification accuracies achieved from the BCI-IV2a and HGD datasets using the baseline models we mentioned above. As can be shown, our approaches have the highest average accuracy, kappa, and F1 score. It can be noted that we compared our result with results achieved by the same training method (the within-subject).
4.2. Results of BCI Competition IV-2a Dataset
All of the proposed models were trained using session “T” from the BCI-IV2a data set and tested on session “E.” In the experiments, a subject-specific method was used. Classification accuracy, Cohen’s score, precision, recall, F1 score, and the number of parameters were all employed to compare the proposed model against state-of-the-art MI-EEG classification models.
Figure 4 illustrates our method’s performance in comparison to the baseline models in BCI-IV2a. As shown in the figure, the proposed model outperforms other baseline models in the BCI-IV2a by more than 7% and at least 1% for the same model without attention blocks.
One of the study’s primary objectives is to identify the best hyperparameters in each branch that can improve classification accuracy with the least amount of complication. As a result, we begin by performing multiple experiments to determine the optimal hyperparameters in the EEGNet block [
38]. Then, we conduct additional experiments to determine the optimal reduction ratio for the SE block.
Figure 5 compares the accuracy of different redaction ratios in the SE block on various EEGNet blocks. As illustrated in
Figure 5, EEGNet Block 3 with a different reduction ratio in the SE block outperforms other blocks by an average accuracy of around 79%. In EEGNet Block 1, the highest accuracy was obtained with a reduction ratio of 4. Reduction ratio 4 is more accurate in EEGNet Blocks 1 and 2, but ratio 2 is more accurate in EEGNet Block 3. The experiments revealed that the number of parameters increases with the number and size of filters in EEGNet Block and with the reduction ratio in SE Block. As a result, we selected a reduction ratio of 2 for EEGNet Block 3 and a reduction ratio of 4 for EEGNet Block 1 and Block 2. That was the set of hyperparameters we used in each branch of our proposed model in both datasets for the SE blocks as we mentioned in
Table 1.
The proposed model was compared to state-of-the-art MI-EEG classification models using classification accuracy, Cohen’s score, precision, recall, and F1 score.
Table 3 summarizes the findings from the BCI-IV2a dataset using MBEEGSE. Additionally, even with this increase in average accuracy, we were still working with a limited number of parameters. To gain a better understanding of the proposed method’s computational complexity, we calculate the number of parameters in our model and compare it to existing multi-branch techniques. As shown in
Table 4, the proposed MBEEGSE has a total of 10,170 parameters across all branches, which is less than other multi-branch models such as TS-SEFFNet and CP-MixedNet, which have 282,000 and 836,000 parameters, respectively.
The time required to predict a motor imagery class from an EEG test sample was calculated using Python commands. According to the Google Colab environment’s specifications, our proposed model takes an average of 1.79 milliseconds to predict the class. Additionally, we calculate the information transfer rate (ITR), which is a critical evaluation metric when developing an embedded system. It is a widely used technique for assessing the communication performance of control systems, more specifically BCI [
39,
40]. The quantity of data transmitted per unit of time is referred to as the ITR. Typically, the ITR is expressed in bits/min using the following formula:
where
T is the number of decisions per minute,
C stands for number of classes (in our case, we have four MI classes), and
A for accuracy. As mentioned above, 4.5 s were used from each trial, so in a minute 13.33 trials can be processed. The average accuracy of the method is
A = 0.8287 and the
ITR achieved for each subject in the BCI-IV2a dataset is presented in
Table 5. From the table, we can see that the average
ITR achieved was 14.93 bit/min, which is a good value in BCI applications [
41].
To investigate the discrimination of the features extracted by our MBEEGSE in greater detail, the t-SNE is used to visualize the learned features. The t-SNE transforms the extracted EEG features into a two-dimensional embedding dimension, as illustrated in
Figure 6. In comparison to ShallowConvNet [
11], DeepConvNet [
11], and EEGNet [
19], our MBEEGSE model implements multi-branch feature extraction and captures more MI-EEG features with fewer parameters. Additionally, the proposed model’s feature visualizations demonstrated that it was capable of extracting both temporal and spectral features from EEG signals. Additionally, the proposed MBEEGSE generates more separable features than the EEGNet, enabling it to distinguish between different types of MI-EEG signals efficiently. As a result, we can see that our MBEEGSE extracts the most discriminative EEG features, implying the highest decoding performance.
4.3. Results of HGD
The accuracy, kappa value, precision, recall, and F1 scores for each subject in the second dataset (HGD) are summarized in
Table 6. Moreover, in the same dataset, the average classification accuracies of our proposed multibranch model (MBEEGSE) are shown in
Figure 7 in comparison to the single-scale models FBCSP [
17], ShallowConvNet [
30], DeepConvNet [
11], EEGNet [
38], and other multiscale networks CP-MixedNet [
27], TS-SEFFNet [
30], and CNN + BiLSTM (fixed) [
15]. The findings indicate that our model effectively addresses the issue of subject and session (time) difference, thereby increasing the accuracy of MI classification.
5. Conclusions
We proposed MBEEGSE, which is a lightweight multibranch model with attention blocks capable of increasing EEG MI classification accuracy while utilizing fewer parameters. Two publicly available datasets, BCI-IV 2a and HGD, were used to validate the performance of the model. The average accuracy and F1 score of the proposed model were 82.87% and 0.829 using the BCI-IV 2a dataset, and 96.15% and 0.962 using the HGD, respectively. The proposed model outperformed the base EEGNet model by more than 10% accuracy, and the multibranch EEGNet without attention blocks by 0.86% accuracy when using the within-subject strategy in the BCI-IV 2a dataset. Similarly, the proposed model performed better than other compared models using the HGD. Two major findings of this study are as follows:
The self-attention mechanism increases the accuracy of EEG-MI classification.
By applying variable optimum reduction ratios of the attention mechanism in different branches, we can reduce the number of hyperparameters in the multibranch model of the EEG-MI classification.
Compared to the base EEGNet, the proposed model has 3.9 times more the number of hyperparameters; however, the accuracy was improved by more than 10%. Though the number of hyperparameters is larger than that in the EEGNet, we can utilize the parallel processing of three branches as they are independent of each other in the proposed model. This will significantly reduce the processing time.
In the future, we intend to investigate various attention strategies to increase the accuracy of EEG-MI classification models and develop models that can be used in advanced BCI systems. Another direction of the future work can be to investigate on which frequencies the model should give more attention for a better accuracy than the proposed model.