1. Introduction
Latterly, the applications of EEG-based BCIs have been the subject of increased research and development [
1,
2]. One of the most critical applications is that people with motor disabilities can control a wheelchair [
3], artificial limbs, or mobile robots [
4]. Other significant applications are in medicine. For example, Brain-Computer Interface (BCI) spellers using code-modulated Visual Evoked Potentials (cVEP) help patients with Amyotrophic Lateral Sclerosis (ALS) to cope with their cognitive impairments [
5], and a BCI to detect sleep apnea based on EEG analysis [
6].
Specifically, Motor Imagery (MI) is widely used in the current BCI systems where improvements to MI-BCI systems are constantly proposed as a finger rehabilitation system decoded the movement of the right-hand index finger [
7], and the Capsule Network (CapsNet) [
8]. These applications highlighted the effectiveness of pattern recognition methods based on MI-EEG signals [
9]. MI-EEG signals are generated when a subject imagines a body limb movement. Such signals are especially captured in the sensorimotor cortex [
10,
11], located in the posterior area in the frontal lobe, involved in imagined movements and muscle control [
12].
The sensorimotor cortex produces similar activation patterns during the imagined and physical movement according to the synchronization and desynchronization of the mu-rhythm [
13]. Li Feng et al. implemented a left- and right-hand MI-EEG signal classifier for a BCI using a Continuous Wavelet Transform (CWT) and a simplified Convolutional Neural Network (CNN) [
14]. Tyagi and Nehra analyzed and extracted relevant features from MI-EEG signals using Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) [
15]. Afterward, the features were classified using Support Vector Machines (SVM), k-Nearest Neighbors (k-NN), and an Artificial Neural Network (ANN).
EEG signals measured by non-invasive systems are difficult to classify because of noise added in the sensors, which leads to a low Signal-to-Noise Ratio (SNR). Besides, the electrodes placed physically over the scalp produce a loss of signal potential due to volume conduction effects [
16]. In all, the processing and classification of MI-EEG signals generated during the imagined finger movements present other significant technical challenges. Large limb movements imply a substantial number of inter-neuronal connections because the involved muscles require a more significant amount of energy than the fingers [
17]. Therefore, the classification of imagined finger movements becomes complex because the fingers move closely together, and specific individual moving features merge.
Kaya et al. developed a public MMI-EEG dataset with five BCI paradigms [
18]; it includes a paradigm related to the MI of individual finger movement. The authors reported only 43% accuracy in finger movement decoding using an SVM classifier in such a study. Xiuling Liu et al. [
19] proposed a parallel spatial–temporal self-attention-based convolutional neural network for MI EEG signal classification. This method uses a spatial-temporal representation of raw EEG signals that uses the self-attention mechanism to extract distinguishable spatial–temporal features. It outperformed state-of-the-art methods for intra-subject and inter-subject classification. Jiacan Xu et al. [
20] proposed a deep multi-view feature learning method for the classification of MI EEG signals. First, a multi-view representation of the EEG signals is obtained by extracting time domain, frequency domain, time-frequency domain and spatial features. Afterward, a parametric t-SNE method is used to extract the deep features from the multi-view representation. Then, a support vector machine (SVM) classifier is used to classify those deep features. The proposed method was tested on the BCI competition IV 2a dataset obtaining excellent classification results. Vernon et al. [
21] proposed a convolutional network architecture named EEGNet for EEG signal classification. The architecture includes a filter-bank structure based on two convolutional layers to adaptively extract common spatial patterns. The authors showed that EEGNet generalizes well across BCI paradigms, and achieves comparable performance to other methods, especially for the case of limited training data. Anam et al. [
22] implemented an Autonomous Deep Learning (ADL) architecture for the classification of individual finger movements based on MI EEG signals. ADL is an architecture capable of constructing by itself its structure and adapts to the input changes. The authors showed that for the case of subject-dependent classification ADL achieves an accuracy of around 77%.
The present work aims to process and classify MI-EEG signals of individual finger movements from one hand by addressing the problem of noisy signals using a method based on the EMD and using more powerful sequence classification architectures, including BiLSTM Recurrent Neural Networks (RNN).
Hence, an Empirical Mode Decomposition based preprocessing stage followed by a stacked BiLSTM network classifier is proposed in this study. The main contributions of this paper are summarized as follows,
An approach to decode imagined individual finger movement from one hand based on a stacked BiLSTM architecture.
An approach for tackling noisy MI-EEG signals based on EMD.
An improved state-of-the-art result for the task of subject-dependent imaginary finger movement classification.
2. Materials and Methods
2.1. Dataset
The EEG BCI dataset was built by Kaya et al. [
18], considering five interaction paradigms related to motor imagery. In particular, the interest of this paper relies on the MI of five right-hand fingers, corresponding to paradigm #3 (5F). The subset of the dataset corresponding to the finger movement imagery consisted of MI-EEG signals from eight subjects captured with the Nihon, Kohden-Japan EEG-1200 JE-921A equipment. Two women and six men aged between 20 and 35 produced 19 file sessions of 4600 MI-EEG samples per subject. This dataset provided 45 min of MI-EEG for all subjects divided into three interaction segments; each segment consisted of the presentation of about 300 MI symbols. The equipment uses 22 electrodes; 19 are active and distributed according to the international standard 10–20 for EEG electrode positioning, shown in
Table 1.
In the creation protocols of the dataset, the developers assert that test subjects were declared good physical and mental health at the capture time [
18].
A recliner chair, suited for all participants, was placed at 200 cm from the monitor and slightly above the reference eye. Later, an eGUI displays the five fingers of the right hand. When a number from one to five is displayed just above a finger as a signal for starting the task, the test subject executes the corresponding imagery movement for one second. Hence, digits 1, 2, 3, 4, and 5 correspond to the thumb, index finger, middle finger, ring finger, and pinkie finger, respectively. The task involves the imagination of the flexion from a finger up and down. This paradigm does not have a neutral state since signals related to this task were not considered in the original dataset.
MI-EEG signals were recorded at 200 and 1000 Hz, where the latter is referred to as a High Frequency (HFREQ). The 5F dataset contains thirteen HFREQ files and six files at 200 Hz collected between 2015 and 2016. Software Neurofax [
24] served to bandpass filter the raw MI-EEG signals from 0.53 to 70 Hz for 200 Hz and from 0.53 to 100 Hz for HFREQ signals. Signals from each sensor are arranged into a matrix as follows
where
n and
m are the number of samples and the number of signals, respectively. All 19 files (13 HFREQ and six files at 200 Hz) from the 5F dataset containing the captured signals (lasting from 3582 to 4040 s) were retrieved to be directly utilized in the proposed signal processing framework. It is because a preliminary preprocessing was applied during and after capturing in the creation of the EEG BCI dataset [
18].
2.2. Overall Flowchart
The channels C3, Cz, P3, and Pz were selected to decode the right-hand finger movement in MI-EEG signals. This choice considers channels focused on the primary motor cortex (M1) and the cerebellum, involved in the motor imagery signals generation [
25]. The fact that right-hand finger imagery movements activate the left cerebral hemisphere and the left-hand finger imagery movements activate the right one, P3 and C3 electrode signals are processed, including those from Cz and Pz for the cortex activation maps during the predefined tasks [
26,
27]. In addition, the {C3, Cz, P3, P3} combination revealed a better decoding accuracy among the {C3, Cz}, {C3, P3}, {C3, P3, Cz}, {C3, P3, Cz}, and {T3, C3, Cz} combinations evaluated in the electrode preselecting step.
Figure 1 shows the head positioning configuration of the corresponding electrodes, which are the most representative and discriminant electrodes in the proposed tasks.
Furthermore,
Figure 2 shows the flowchart of the proposed approach.
2.3. EEG Signal Preprocessing Based on EMD
One problem when dealing with EEG signals is that they are inherently non-stationary. This phenomenon is because brain processes change due to brain state changes, e.g., mental fatigue [
28]. This non-stationarity has severe implications for the generalization ability of deep neural network architectures [
29].
Current approaches for dealing with non-stationarity are mainly based on trend removal [
30]. The main trend removal approaches are high-pass filtering, moving average removal, polynomial fitting, and empirical mode decomposition. Among high-pass filtering approaches, empirical mode decomposition is one of the most reliable methods in terms of efficiency and simplicity [
30,
31].
EMD method decomposes a signal into an Intrinsic Mode Functions (IMFs) sum. The IMFs obtained from natural EEG signals provide analytical features (amplitudes, frequency, and phase), which improve the BiLSTM learning algorithm. It is a specific benefit of the EMD approach targeted in this study.
Empirical Mode Decomposition (EMD) is a signal processing tool proposed by Huang et al. to analyze nonlinear and non-stationary signals [
32]. IMFs must fulfill the following constraints:
EMD can be used to denoise 1-D EEG signals because of the frequency-decreasing property of IMFs [
33]. The IMFs represent the oscillation modes in the signal, so the first IMF contains the highest frequency, and the last IMF contains the lowest frequency. Algorithm 1 shows the steps performed by the EMD algorithm. Once completed the sifting process, the original MI-EEG signal can be recovered as follows
where
N is the number of computed IMFs from the original EEG signal, and
is the final residue. EMD operates similarly to a filter bank of bandpass filters for modes with indexes greater than 1 and a high-pass filter for mode 1 [
34]. Therefore, Equation (
3) describes the signals relations obtained when the first EMD step is applied to the EEG signal,
where
represents the low-frequency components of the signal.
Algorithm 1: MI-EEG Signal Decomposition using EMD. |
1: | Let be the matrix form of signals denoted by EEG(t). |
2: | Find the and . |
3: | Use the Cubic Spline interpolation to construct the upper EEGU(t) and lower EEGL(t) envelope connecting all and points. |
4: | Calculate the local Mean EEGC(t) = (EEGU(t) + EEGL(t)). |
5: | Obtain Ri(t) = EEG(t) − EEGC(t). |
6: | Conclude the ith IMF order if Ri(t) satisfies the given IMF conditions with IMFi(t) = Ri(t), otherwise repeat steps 1 to 5. |
7: | Find the remaining IMFk(t) component by subtracting EEGU(t) − IMFi(t) and repeating the sifting process until to obtain a constant residue (no more oscillations). |
Huang et al. developed a method that allows determining the number of iterations to stop the sifting process [
35]. This method is based on two criteria:
All local maxima are strictly positive, while all local minima are strictly negative; and
The number of extrema points remains unchanged.
Moreover, the sifting process also stops when the standard deviation of the difference between two successive sifting steps is smaller than a threshold [
32]. This last sifting stoppage criterion is given by
where
is a predefined threshold.
On the other hand, if IMFs do have different frequencies at the analyzed time, their analytic form (AIMF) can be expressed as
where
and
are the instantaneous amplitude and phase of each IMF
i, respectively. These parameters can be estimated using the Hilbert transform [
35] as follows
where
represents both the corresponding IMF component and the real term of AIMF,
and
is the Hilbert Transform (HT) of
given by
where * is the convolution operator. Consequently, the analytic form for the
i-th IMF becomes
Zhang et al. found that the first four IMFs account for most of the cumulative variance contribution rate [
36]. Thus, in our approach, the sum of only the first four IMFs will be used as the preprocessed EEG signal.
Figure 3 illustrates the empirical mode decomposition of an EEG signal. So, the last IMFs and the residue capture the signal trend.
Figure 4 shows an example of the preprocessed EEG signal using the sum of the first four IMFs. Notice how the residue captures the trend of the signal.
2.4. Bidirectional LSTM (BiLSTM)
Long-Short Term Memory (LSTM) networks are a type of recurrent neural network initially designed to solve the vanishing gradient problem of recurrent neural networks when dealing with long sequences [
37]. An LSTM network’s architecture consists of a layer of LSTM units followed by a standard feedforward network.
Figure 5 shows a single functional block of an LSTM unit.
In a general perspective, an LSTM unit operates as follows: let
be the current input at time
t, the output of the input gate is as follows,
where
and
are weight matrices,
is the previous hidden state of the unit, and
is the bias vector. The function
is a sigmoid function used for gating.
Similarly, the output of the forget gate
is computed as
Finally, the outputs of the output gate
and cell state
are as follows,
where ⊙ is the Hadamard product.
A BiLSTM consists of two parallel LSTM layers: one for the forward direction and one for the backward direction [
37,
38]. Because the input is processed twice, BiLSTMs extract more information from the input. Thus, improving contextual information to make better predictions than LSTMs. Therefore, BiLSTMs present faster convergence and accuracy than LSTMs [
39].
Figure 6 presents the BiLSTM architecture consisting of two LSTM layers, keeping past and future context at any time of the sequence.
The outputs of each LSTM are combined according to the following equation:
where
and
are the outputs of the forward and backward LSTMs.
2.5. Proposed Architecture
A feature matrix constituted by each preprocessed EEG signal is applied to the input of the first BiLSTM layer. A stacked architecture has been chosen to learn the complexity of the features extracted by the BiLSTM network. Several experiments were completed for 2, 3, and 4 stacked layers to determine the number of BiLSTM layers to implement, and the configuration with 3 stacked BiLSTM layers provided the highest accurate classification.
Each BiLSTM layer consists of 12 memory units, as illustrated in
Figure 7. The output of the stacked BiLSTMs is a matrix
. This matrix is then converted into a vector of size
, i.e.,
. The value of W is 170 for 200 Hz signals, whereas for 1000 Hz signals, it is 850. This vector is the input to a dense layer.
The dense layer uses the SoftMax activation function to classify the representative features into the class labels.
The batch size for all network training was set to 330. The model was implemented in Python 3.6 using Keras and TensorFlow. The loss function was defined as the Categorical Cross-Entropy, the learning algorithm was defined as the Nadam optimizer, and the additional metrics to be computed during training was the Accuracy metric.
Moreover, the Cyclical Learning Rate (CLR) [
40] method was used to accelerate the convergence of the training algorithm. Another reason for using CLR is that it can help the training algorithm escape from local minima. So, the minimum and maximum learning rates were set to
and
, respectively. The step size was 8 times the number of iterations per epoch.
The complete model was trained for 300 epochs on a Windows 10 desktop equipped with an NVIDIA GTX 1080 Ti GPU; each training was repeated at least twice.
3. Results and Discussion
This study implements the test subject-dependent approach where signals from a single subject were classified to decode right-hand finger imagery movements. The subject signals were coded using two particular sampling frequencies, 1000 Hz or 200 Hz. EEG signals used in the experiments, corresponding to the electrodes C3, Cz, P3, and Pz, are presumably involved in the movements from the right-hand fingers [
26].
Hence, k-fold CV was used to assess the training performance of the model. Considering that the dataset is relatively small, the value of k was set to 200. So, the dataset was split into 200 disjoint subsets of equal size. Then, one different subset is taken as the test data for each training. At the same time, the remaining subsets 199 are used as learning data. The performance is taken as the average of the 200 testing accuracies. The number of samples in the datasets were between 940 and 1917.
Table 2 shows the model accuracy for different subjects and sampling frequencies.
As a result, the highest testing accuracy (76.13%) for signals at 1000 Hz corresponds to subject F, while the lowest accuracy (66.0%) corresponds to subject I. The highest testing accuracy (82.26%) for signals at 200 Hz corresponds to subject C, while the lowest accuracy (75.2%) corresponds to subject B.
Table 3 shows the impact of the number of BiLSTM layers on the accuracy of the proposed method. These results show that accuracy reaches a maximum at three BiLSTM layers. Therefore, the configuration of three BiLSTM layers was used in all tests for this study.
Table 4 shows the number of model parameters for the case of 1000 Hz and 200 Hz signals. Besides, the last BiLSTM layer outputs the whole processed sequence. Therefore, the proposed architecture is different from other architectures because they use only the last state of the last BiLSTM layer for classification.
The results obtained by the presented approach outperformed those reported in [
18], where an average accuracy of 43% was achieved in the decoding of five fingers movements. Besides, the Support Vector Machines (SVM) method was used to classify MI-EEG signals from every single subject using only the C3 channel.
The network model was also trained and tested on the samples of all subjects (A, B, C and F) to determine what was the behavior of the accuracy for samples of different subjects For that purpose, the network model was trained each time using data from four subjects (A, B, C, and F) and for one of two available sampling frequencies: 1000 Hz and 200 Hz.
As a result, the proposed model achieved 80.04% and 82.26% accuracy for 1000 Hz and 200 Hz signals, respectively. Those results outperform the results obtained by Kaya et al. for five subjects, achieving an accuracy between 40% and 60% [
18]. They achieved an accuracy between 20% and 40% for three subjects, considering all the 13 subjects.
Table 5 shows a comparison with other results from the literature for subjects A, B, C, and F, with 200 Hz and 1000 Hz signals.
The choice of subjects is mainly because, for those subjects, the dataset contains both 200 Hz and 1000 Hz signals. Thus, the results obtained by Anam et al. with the subject-dependent classification are slightly smaller than the results obtained by our approach for the case of 200 Hz signals [
41]. However, in the case of 1000 Hz signals, the method of Anam et al. performs better than the proposed framework, except for subject C. This issue can be explained by the number of parameters used for the 1 kHz case, which was about five times the number of parameters for 200 Hz, as shown in
Table 4. Hence, the proposed model presented overfitting issues for this case. EEGNet [
21] was also trained using the proposed preprocessing method for comparison purposes. As a result, the accuracy was very close to the accuracy of the proposed approach. Finally, the EMD-based preprocessing method resulted in about 32.6% faster training convergence for all tests.