Motor Imaging EEG Signal Recognition of ResNet18 Network Based on Deformable Convolution

Du, Xiuli; Li, Kai; Lv, Yana; Qiu, Shaoming

doi:10.3390/electronics11223674

Open AccessArticle

Motor Imaging EEG Signal Recognition of ResNet18 Network Based on Deformable Convolution

by

Xiuli Du

^1,2,*,

Kai Li

^1,2,

Yana Lv

^1,2 and

Shaoming Qiu

^1,2

¹

Communication and Network Laboratory, Dalian University, Dalian 116622, China

²

School of Information Engineering, Dalian University, Dalian 116622, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(22), 3674; https://doi.org/10.3390/electronics11223674

Submission received: 5 October 2022 / Revised: 31 October 2022 / Accepted: 1 November 2022 / Published: 10 November 2022

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

EEG signals with a weak amplitude, complex background noise, randomness, significant individual differences, and small data volume lead to insufficient feature extraction and low classification accuracy. Spurred by these concerns, this paper proposes a motor imaging EEG signal classification method based on fusing the improved ResNet18 network with the deformable convolutional network (DCN). Specifically, the original signal’s local airspace characteristics are enhanced by the common spatial patterns (CSP), and the time-frequency domain characteristics are displayed using the short-time Fourier transform (STFT). Then, the signal is converted into a time-frequency map, where a deformable convolution is applied to capture the contour characteristics of the time-frequency map. This strategy solves the problems of traditional convolution related to hard rules, i.e., the convolutional kernel shape can only be a square or rectangular core and cannot be dynamically changed according to the recognition target, resulting in a low recognition rate, prohibiting the network from extracting hidden features and affording enhanced identification and classification. Experimental results demonstrate that our method attains an average classification accuracy on a two-classification and two four-classification motor imaging EEG signals of 90.30%, 86.50%, and 88.08%, respectively, which is much higher than current work, proving our method’s effectiveness.

Keywords:

motion imagining; time-frequency plot; deformable convolution; signal recognition

1. Introduction

With the advancement of wireless transmission, machine learning, artificial intelligence, and other technologies, the research of brain-computer interface (BCI) technology based on electroencephalography (EEG) has increased accordingly as a transformative communication and control technology [1]. In recent years, BCI technology has emerged in many fields, ranging from the medical field, with significant advantages, to education, military, entertainment, smart homes, and other aspects of everyday life.

EEG signals have the characteristics of weak amplitude, complex background noise, randomness, significant individual differences, and contain a lot of time and space information, making it challenging to achieve reliable recognition [2,3,4,5]. Therefore, accurate and reliable EEG signal recognition [6] will help to improve the accuracy and reliability of the system. At present, feature extraction has many applications in image, video, speech, and text, with deep learning-based networks widely outperforming traditional algorithms. Based on its successful application in numerous fields, deep learning technology has also been applied for EEG signal recognition and analysis [7]. Indeed, to date, several researchers worldwide have proposed feature extraction and classification methods for various EEG signals. For instance, Arias-Vergara et al. [8] propose a methodology to combine three different time-frequency representations of the signals by computing continuous wavelet transform, Mel-spectrograms, and Gammatone spectrograms and combining them into 3D-channel spectrograms to analyze speech in the following two different applications: (1) automatic detection of speech deficits in cochlear implant users and (2) phoneme class recognition to extract phone-attribute features. Lopac et al. [9] propose a method for the classification of noisy non-stationary time-series signals based on Cohen’s class of their time-frequency representations (TFRs) and deep learning algorithms. This study suggests that using alternative TFRs of Cohen’s class can improve the deep learning-based detection of non-stationary GW signals in an intensive noise environment. The TFR-CNN models achieve the values of the classification accuracy of up to 97.10%. Khare et al. [10] propose a method to convert the filtered EEG signals into an image using a time-frequency representation. A smoothed pseudo-Wigner–Ville distribution is used to transform time-domain EEG signals into images. These images are fed to pretrained AlexNet, ResNet50, and VGG16 along with configurable CNN. The accuracy scores of 90.98%, 91.91%, 92.71%, and 93.01% obtained by AlexNet, ResNet50, VGG16, and configurable CNN show that the proposed method is the best among other existing methods. Xu et al. [11] constructed a 2-layer convolutional neural network as a classifier and identified the time-frequency images of C3, Cz, and C4 channels obtained by the wavelet transform as inputs to improve classification accuracy. Liu et al. [12] extracted and classified the characteristics of motor imaging EEG signals based on PS0-CSP-SVM, which first relied on the particle swarm optimization algorithm to obtain the optimal frequency bands and periods of different individuals and then used the support vector machine for classification. Jiyue et al. [13] suggested a feature extraction method based on the optimal region common space mode (ORCSP) for different objects and selected the region with the highest separability based on Euler distance and variance ratio. Finally, a support vector machine scheme acted was employed for classification. Shan et al. [14] combined Relief’s statistical correlation principle and the iterative idea of a sequential backward selection algorithm. In this work, the EEG channel was selected, and the correlation coefficient method was used for classification, with the channel that obtained the optimal classification accuracy identified as the optimal channel. Feng et al. [15] proposed a channel selection method for the multi-frequency band (CSP-R-MF) based on multi-band common spatial pattern filter sequencing, combined with multi-band signal decomposition filtering and the CSP-Rank method to select channels. Jin et al. [16] proposed a correlation-based channel selection (CCS) based on Pearson’s correlation coefficient, which used the Pearson’s correlation coefficient to select channels associated with MI tasks and then performed regularized cospace mode (RCSP) [17] feature extraction on these channels. These models provide the theory and idea of motor imagination EEG signal feature extraction and classification, and the effect is considerable. However, the time-frequency domain and airspace feature extraction of the signal is insufficient, especially in subjects with inconspicuous features.

The network structure involving a deformable convolution module increases the offset, while the offset learning and the size and position of the deformable convolutional kernel can be dynamically adjusted according to the image content to be recognized. This intuitive effect affords the convolutional kernel to adaptively sample points in different positions according to the image content, depending on the geometric deformation of the time-frequency map shape, size, and other geometric deformations due to expanding the data set or other individual differences. This adaptive scheme achieves appealing results in extracting image features. In EEG signal recognition, the ResNet18 is typically used to solve the performance degradation problem by introducing residual blocks while ensuring the network’s performance. However, to the best of our knowledge, there is no research combining deformable convolution and ResNet18 for EEG signal recognition. Based on the above analysis, this paper proposes a classification network that integrates the ResNet18 network and deformable convolution, namely, the DCN-ResNet18 network, to fully extract the time-frequency domain information of the time-frequency map, mine the hidden features of EEG signals, and further improve the accuracy of motor imaging EEG signal classification.

The remainder of this paper is organized as follows: Section 2 introduces the principles and structure of deformable convolutional networks. Section 3 introduces the ResNet18 network and the structure of the DCN-ResNet18 network proposed in this paper. Section 4 analyzes and discusses the experimental results. Finally, Section 5 presents the conclusion and future work.

2. Deformable Convolutional Network

Deformable convolution [18] is a new convolution method proposed by Dai et al. in 2017. The regular lattice sampling in the standard convolution makes it difficult for the network to adapt to geometric deformation, while in the deformable convolution, an offset variable is added to the position of each sampling point. These variables shift the convolutional kernel to effectively sample the data without being limited by a fixed shape, thereby significantly increasing the receptive field of the convolutional operations and enhancing the convolutional neural networks’ modeling ability to irregular targets.

The model in this paper uses a convolutional kernel of size 3 × 3 and defines R as the sampling region of the convolutional kernel described as follows:

R = {(- 1, - 1), (- 1, 0), \dots (0, 1), (1, 1)}

(1)

For a traditional convolution operation, each position

p_{0}

on the output feature plot y is calculated as Equation (2):

y (p_{0}) = \sum_{P_{n} \in R} w (p_{n}) \cdot x (p_{0} + p_{n})

(2)

where

p_{n}

enumerates the position in

R

.

For the deformable convolution operations, each position

p_{0}

on the output feature plot y is calculated as Equation (3):

y (p_{0}) = \sum_{P_{n} \in R} w (p_{n}) \cdot x (p_{0} + p_{n} + Δ p_{n})

(3)

Sampling is now performed on irregular and offset positions

P n + Δ P n

. Since the vectors are usually decimals, Equation (3) is implemented through bilinear interpolation expressed in Equation (4).

x (p) = \sum_{q} G (q, p) \cdot x (q)

(4)

where: p is the position after the sample point offset, q is an integer grid point, and

G (q, p)

is the sample point in integer form a bilinear interpolation operation. The bilinear interpolation formula is

G (q, p) = g (q_{x}, p_{x}) . g (q_{y}, p_{y})

, with

g (a, b) = \max (0, 1 - | a - b |)

.

The deformable convolution feature extraction operation on the input image overcomes the traditional convolution operation’s hard rules by changing the receptive field’s fixed mode via introducing offsets and different weights. Thereby significantly increasing the range of the receptive field and making it better converging in the characterization area, improving the network’s adaptive ability to change the image con-tour features, and capturing features without redundant information. After the original EEG data is converted into a two-dimensional time-frequency map, the contour features of the time-frequency map are more prominent, affording a more comprehensive and accurate feature extraction from the EEG signal. Figure 1 compares the traditional and deformable convolutions in sampling, respectively, where the deformable convolution includes deformation methods such as scale change, expansion, and rotation.

The operation mechanism of DCN is illustrated in Figure 2. DCN first obtains a set of prediction results for the convolutional kernel offset through a convolutional operation acting on the input feature map. The dimensions of this offset feature map remain the same as the input feature map. The number of channels in the offset feature map is 2N, where 2 means that each offset is (x, y) having two values, and N is the number of pixels in the convolutional kernel. For example, as shown in Figure 3, when the size of the convolutional kernel is 3 × 3, then N = 9. The original 3 × 3 convolutional layer has the same hyper parameters as the 3 × 3 convolutional layer in the DCN, but the latter has an output channel of 18 due to the need to learn offsets, and the channel size of 18 corresponds to these 9 sample points with a two-dimensional offset.

3. DCN-ResNet18 Network

3.1. ResNet18 Network

A deep learning network’s final classification recognition accuracy is also affected by its depth. However, simply stacking networks deteriorates the network’s effectiveness as the depth increases, raises the training difficulty, the nonlinear factors increase, and the network cannot easily converge. Given the above problems, He et al. proposed ResNet [19], which comprises convolutional layers, pooling layers, normalized layers, residual structures, and fully connected layers.

When building a convolutional network, the higher the depth of the network, the richer the level of features that can be extracted. Therefore, in general, we tend to use a deeper network structure in order to achieve higher-level features. However, when we using deep network structures, the following three problems will arise: gradient disappearance, gradient explosion problem, and network degradation problem. The introduction of residual structure is the characteristic of ResNet networks. When we design the network structure, we do not know how many levels of network is the optimal network structure, so there will be redundant layers, before introducing the residual structure, we want this layer to learn parameters to be able to meet

h (x) = x

; that is, the input is

x

, after passing through this redundant layer, the output is still

x

, but it is more difficult to learn

h (x) = x

; that is, the parameters of this layer when the identity map is used. The residual structure avoids learning the parameters of the layer’s identity map, using the structure as shown above, so that

h (x) = F (x) + x

; here,

F (x)

, we call the residual term, and we find that for this redundant layer to be equally mapped, we only need to learn

F (x) = 0

. Learning

F (x) = 0

is simpler than learning

h (x) = x

, because the parameter initialization in each layer of the network is generally biased towards 0, so that the redundant layer learns the updated parameters of

F (x) = 0

faster convergence than updating the parameters of the network layer. The residual structure is shown in Figure 4.

By introducing residual structures, the ResNet18 network solves the gradient vanishing, gradient explosion, and network performance degradation problems owing to many network layers, making the network easy to optimize, thereby protecting the integrity of information during feature extraction. However, the ResNet18 network still suffers from vague classification boundaries and insufficient accuracy when performing classification tasks. The ResNet18 model architecture is illustrated in Figure 5.

3.2. Improved ResNet18 Network Model Building

This study improves ResNet18 by partially replacing the traditional convolution with deformable convolution. In the convolutional calculation process, the classic deep convolutional neural network considers only the target center pixels, and its convolutional kernel matrix size is within the pixels’ range used for the calculations. As the convolutional depth increases, the mapping of the receptive field is still rectangular and inconsistent with the actual variable time-frequency map contour features, resulting in poor target recognition. Compared with the traditional convolution, the deformable convolution improves the classification accuracy.

The structural pairs between the proposed network and the ResNet18 network in this study are shown in Table 1, and the schematic diagram of the improved ResNet18 network structure is shown in Figure 6, mainly comprising an input layer, convolutional layers, pooling layers, normalization layers, residual structures, fully connected layers, and a softmax classification layer. As can be seen from the network structure diagram, the feature extraction network in this paper is based on ResNet18, which replaces the last two traditional convolutions in the second and third residual structures with deformable convolutions to form a DCN-ResNet18 feature extraction network.sd.

Based on the above, the block diagram of the developed motor imaging EEG signal recognition is depicted in Figure 7.

4. Experimental Simulation and Analysis

4.1. Experimental Test Dataset

The EEG signal recognition and classification based on motor imagination is a hot research topic, with the bi-classification and tetra-classification MI-EEG datasets being more extensive and mature. Therefore, this paper selects these three datasets of dataset 2b [20], dataset 2a [21], and dataset 3a [22] from the International Brain-Computer Interface Competition for research.

BCIC IV dataset 2b

The experimental data are from the BCI Competition IV. The 2b dataset was collected by the Brain-Computer Interface Laboratory of the Technical University of Graz. This dataset involves EEG data recordings of nine subjects’ imagined left-handed and right-handed movements. Each subject collected 5 groups of EEG data through the three electrode channels of C3, Cz, and C4, and the first 3 groups are the training data, including 400 exercise imaging experiments. The last 2 groups are the test data containing 320 exercise imaging experiments. Among the 5 groups of EEG data, the first 2 groups have 120 experiments each, and the neurofeedback data without recognition results. The last 3 groups have 160 experiments each, which are neurofeedback data with recognition results. At the beginning of each session, a nearly 5-min segment of EOG test data is included to assess how much the eye signal interferes with the EEG signal, thereby removing artifacts in the EEG signal. The experimental paradigms without feedback and with feedback are illustrated in Figure 8 and Figure 9. The neurofeedback experiment with recognition results considers displaying a gray humanoid face prompt to prepare for the experiment within 2 s before the start. The task prompt appears in the 3rd second, and then the subject performs the corresponding motor imagination task according to the task prompt. The system will either give a green smiley face mark moving in the correct imaginary direction on the screen according to the recognition result or a red bitter face marker moving in the wrong imagined direction. The EEG data recorded by the channel is filtered by a bandpass filter ranging from 0.5Hz to 100 Hz, the sampling frequency is 250 Hz, and a 50 Hz notch filter eliminates the power frequency interference.

BCIC IV dataset 2a

This dataset (http://www.bbci.de/competition/iv/, (accessed on 1 January 2008)) has EEG data from nine subjects and four different motor imaging tasks, namely, motor imaging of the left hand (first type), right hand (second type), feet (third class), and tongue (fourth class). Each subject performed two acquisition phases at a sampling frequency of 250 Hz, each comprising 6 rounds of testing, where each testing round has 48 experiments, 12 times per class in four categories, and a total of 288 experiments per subject. Twenty-two Ag/AgCl electrodes (with inter-electrode distances of 3.5 cm) were used to record the EEG. At the beginning of each session, a recording of approximately 5 min was performed to estimate the EOG influence. The experimental paradigm for the signal acquisition is depicted in Figure 10.

BCIC III dataset 3a

This dataset (http://www.bbci.de/competition/iii/ (accessed on 1 January 2005)) has three subjects (k3b, l1b, k6b) who had different experiences with motion images, k3b performed 90 experiments for each motor imagination, for a total of 360 experiments, and l1b and k6b performed 60 experiments for each motor imagination, for a total of 240 experiments. The EEG was sampled at 250 Hz and filtered between 1 and 50 Hz with the notch filter turned on, recording 60 EEG channels, and the time of motion imagination was 4s. The timing diagram of the whole testing process is presented in Figure 11.

4.2. Data Preprocessing

The co-spatial [23] mode is an airspace filtering feature extraction algorithm for two classification tasks, which can extract the spatial distribution components of each class from the multi-channel brain-computer interface data. The co-spatial mode improves the signal-to-noise ratio of the EEG signals, enhances the local activity, and weakens the common noise in the electrodes to distinguish different categories to the greatest extent. The one-to-many co-spatial model is a special case of the two-classification co-spatial model. When performing multi-classification tasks, the one-to-many co-spatial model sets a class as a positive class, and the rest of the classes are treated as an anti-class, converting the multi-classification task into a two-classification task. The basic principle is to use the diagonalization of the matrix to find an optimal set of spatial filters for projection so that the variance difference between the two types of signals is maximized, and an eigenvector with a high degree of differentiation is obtained. The spatial filter projection according to Equations (5) and (6) are as follows:

B_{i} = U_{1, 2 m}^{T} P_{i}

(5)

Z^{j} = P_{j} X

(6)

where

B_{i}

represents the spatial filtering in class

i

,

U_{1, 2 m}^{T}

is a feature vector in a certain mode type,

P_{i}

denotes the whitening matrix in this pattern type,

X \in R^{N \times M}

is the

N \times M

dimension matrix of each MI-EEG task sample, with

N

the number of electrodes and

M

the number of sampling points per electrode.

Z^{j} \in R^{2 m \times M}

is the filtered signal in class

j

and

2 m

is the number of filtered channels.

Considering the two-classification dataset 2b, when the subject is performing the left-hand and right-hand motion imaging tasks, the energy of the associated motor sensory region rhythm (from 8 to 13 Hz) and the rhythm (from 17 to 30 Hz) on the contralateral side of the brain decreases. Additionally, the rhythm and rhythm energy of the related motor sensory region on the ipsilateral side increases, presenting the event-related synchronous ERD and event-dependent desynchronized ERS phenomenon. Thus, for the 2 s-length EEG signal collected by each electrode, we pass it through the co-spatial mode and then use STFT [24] to obtain a 257 × 32 time-frequency plot. The STFT adopts a Hamming window with a length of 64 and a time interval of 14 and then extracts the 8–13 Hz band and the 17–30 Hz band for the obtained time-frequency map to obtain a left-right hand two-dimensional time-frequency pattern of 12 × 32 and 29 × 32, respectively. In addition, to ensure the consistency of the two frequency bands, the time-frequency plot of the 17–30 Hz band is adjusted to 12 × 32 through cubic interpolation. Finally, in this paper, all frequency bands of the three electrodes are combined to form a time-frequency pattern of (3 × 2 × 12) × 32 size (i.e., 72 × 32, as shown in Figure 12), and then the time-frequency plot conversion size is 224 × 224 using image augmentation technology as the input sample size of the network. For each subject, dataset 2b consists of five stages; the first three stages contain 400 samples for training and the last two stages contain 320 samples for testing.

For the four-classification datasets 2a and 3a, the EEG bands related to the collected EEG signal and motion imaging are mainly concentratedin 8Hz to 30Hz. Thus, after segmenting the original data, the EEGLAB toolbox built into Matlab’s FIR filter is used for filtering, followed by a one-to-many common space mode to distinguish different categories to the greatest extent. Finally, the short-term Fourier transform is used to convert the original signal into a two-dimensional time-frequency map as an input to the network (Figure 13). For dataset 2a, it consists of two phases, with 288 samples from the first stage for training and 288 samples from the latter stage for testing; for dataset 3a, the experiments performed by each subject are evenly divided into test sets and training sets in a 1-to-1 ratio.

The EEG signal contains sufficient airspace features in the time-frequency domain. Through the common space, the one-to-many common space mode and the short-term Fourier transform of the original signal characteristics are reflected in the two-dimensional time-frequency map. However, the sample size is small, and to improve the adaptability and generalization of the network model, the obtained two-dimensional time-frequency map is expanded by adjusting the brightness and applying horizontal mirroring, flipping, and pretzel noise. The 10-fold expanded data set is then used as the input of the neural network, enhancing the network’s feature extraction ability and entering deeper into the motion to extract the hidden dynamic information in EEG signals.

4.3. Simulation Verification and Analysis of Results

In this paper, the classification accuracy rate commonly used in the classification of motion imaging is used as the evaluation index of the model. The accuracy rate is the proportion of the motion imagination that is correctly classified, that is, the ratio of the number of correctly classified samples in all classes to the total number of samples, and the specific calculation formula is as follows:

A c c u r a c y = \frac{T P + T N}{T P + F N + F P + T N}

(7)

TP represents the number of instances that are correctly classified as positive examples—that is, the number of instances that are actually positive and are classified as positive by the classifier; FP represents the number of instances that are incorrectly classified as positive examples, those that are actually negative but are classified as positive by the classifier; FN represents the number of instances that are incorrectly classified as negative, that is, the number of instances that are actually positive but are classified as negative by the classifier; TN stands for the number of instances that are correctly classified as negative, that is, the number of instances that are actually negative and are classified as negative by the classifier.

For neural networks, different parameter settings correspond to different network structures and affect a neural network’s performance. In this paper, changing the size of the parameters affects the size of the convolutional kernels and, thus, the deformable convolutional layers. The corresponding network classification effect is verified on dataset 2b, dataset 2a, and dataset 3a, with the experimental results reported in Table 2, Table 3 and Table 4.

The experimental results in Table 2, Table 3 and Table 4 highlight that when the deformable convolution kernel size of the residual structures 2 and 3 is 3 × 3, the model’s identification accuracy is the highest. This is because the dimension of the EEG signal data after the preprocessing proposed in this paper is small. Thus, the large convolutional nucleus easily covers the entire two-dimensional image, reducing the resolution of the local features extracted by the convolutions. The latter does not afford to capture the valuable features affecting the recognition effect. However, the local area covered by the small convolutional kernel is small; therefore, the convolution can only extract a small local feature, improving the recognition effect.

After determining the network parameters based on the previous comparative analysis, we compare the performance of our model against ResNet18 on nine subjects in the two-classification dataset 2b. Additionally, we compare their classification accuracy on the test set to demonstrate the proposed model’s performance further. The corresponding experimental results are reported in Table 5. The numbers in parentheses indicate the residual structure in which the deformable convolution replaces the ordinary convolution. For example, (2, 3), represents substitution in the second and third residual structures.

The suggested neural network recognition method is challenged against other binary classification MI-EEG recognition methods on data set 2b, with Table 6 reporting the average recognition accuracy. In order to evaluate the recognition performance of the left and right-hand movement imaging EEG signals, we compare the proposed method with other approaches, including KLD (Wang et al., 2020) [25], CEMD-MSCNN (Tang et al., 2020) [26], and DBN (Chu et al., 2020) [27], which have been widely used in recent years. In addition, to reflect the diversity of the competitor methods, we also compared the methods used by the top three BCI IV competitions (Chin, Gan, and Coyle in order of ranking from highest to lowest). The recognition rates of the test samples of all subjects under the above method are presented in Table 5. The latter table reveals that the average recognition rate of the proposed method is 89.3%, better than other methods, indicating that deformable convolution has great advantages in fully excavating the time-frequency information of EEG signals. For a single subject, the recognition rate of our method is 98.4%. The average recognition accuracy of all subjects obtained by our method is significantly improved compared to competitor algorithms. This is more evident for subjects with low classification accuracy, where the developed recognition method fusing ResNet18 and deformable convolution attains an appealing performance in recognizing binary motor imaging EEG signals.

We further compare the proposed neural network recognition method with four classification MI-EEG recognition methods on dataset 2a, and the average recognition accuracy of the subjects is presented in Table 7. In [28], Li et al. develop a dense feature fusion convolutional neural network (DFFN), which correlates the adjacent layer and cross-layer features to reduce the information loss during the convolutional operations and considers the network’s local and global characteristics, obtaining an average accuracy of 79.90% on the 2a dataset. Lawhern et al. [29] propose a compact convolutional neural network named EEGNET, which uses deep and separable convolution to construct an EEG classification model, obtaining an average accuracy of 73.42%. A CNN-LSTM [30] is a hybrid deep neural network based on OVR-FBCSP, where a CNN and long short-term memory (LSTM) decode the motor-imagined EEG signals, achieving an accuracy of 84.15%. In [31], Gaur et al. suggest a new filtering method based on multivariate empirical mode decomposition (MEMD) based on specific topics, namely, SS-MEMDBF, which extracts cross-channel information and locates specific frequency information, achieving a 79.94% accuracy rate. Wu et al. [32] suggest a parallel multi-scale filter bank convolutional neural network (MSFBCNN), which extracts temporal and spatial features from EEG and achieves an accuracy rate of 74.91%. Liu et al. [33] present a parallel spatial-temporal self-attention-based convolutional neural network, which uses the spatial self-attention module to capture the spatial dependencies between the channels of MI EEG signals and achieves an accuracy rate of 75.01%. Song et al. [34] propose a novel deep learning method DMTL-BCI based on the multi-task learning framework for EEG-based classification tasks. The proposed model is proposed to improve the classification performance with limited EEG data, obtaining an average accuracy of 75.21%. Amin et al. [35] propose a multi-layer CNNs method for fusing CNNs with different characteristics and architectures to improve EEG MI classification accuracy. The proposed MCNN method achieves 75.7% on dataset 2a. Ingolfsson et al. [36] propose EEG-TCNet, a novel temporal convolutional network (TCN) that achieves outstanding accuracy while requiring few trainable parameters and improves the accuracy to 77.35%. The average recognition accuracy of all subjects utilizing this method is 86.50%, which is significantly improved compared with other algorithms, especially in subjects with low classification accuracy. Generally, ResNet18 utilizing the deformable convolution proposed in this paper attains an appealing performance in multi-category EEG signal recognition.

In order to further verify the effectiveness of this method, we compared its recognition accuracy rate against the existing models on the BCIC III dataset3a, including Kernel-B2DDLPP (Zhao et al., 2019) [37], multi-branch-3D (Zhang et al., 2019) [38], with the experimental results reported in Table 8. The experimental results verify our model’s effectiveness in classifying EEG signal recognition in multi-classification motor imaging.

5. Conclusions

This paper investigates the MI-EEG classification problem and suggests DCN-ResNet18, a ResNet18 network that utilizes deformable convolutions. Specifically, the EEG data source signal is preprocessed using the co-spatial or one-to-many co-spatial mode and the short-term Fourier transform to obtain a two-dimensional time-frequency map containing more comprehensive EEG signal characteristics. This preprocessed signal is used as input, and feature information hidden in the EEG signal is extracted adaptively using DCN-ResNet18. The deformable convolution employed can capture the contour features of the time-frequency map, solving the problems of traditional convolutions where features are not prominent and the expanded data set cannot afford adaptive changes when the image is scaled or rotated. Hence, the deformable convolution can effectively improve the network’s feature extraction ability. Several experiments demonstrate that ResNet18 utilizing deformable convolutions achieves an average accuracy of 90.30%, 86.50%, and 88.08% on the public datasets BCIC IV. dataset 2b, BCIC IV. dataset 2a, and BCIC III. dataset 3a, respectively, revealing that the proposed method is more effective in MI-EEG recognition classification than other methods. The classification accuracy of EEG signals is significantly increased, but the computational amount of the neural network is slightly increased, and efforts will be made to reduce the weight of the network in the future to reduce the number of parameters of the neural network.

Author Contributions

Conceptualization, X.D. and K.L.; methodology, X.D. and Y.L.; software, K.L. and S.Q.; validation, X.D. and K.L.; formal analysis, K.L. and Y.L.; investigation, K.L. and S.Q.; resources, X.D. and S.Q.; data curation, X.D. and K.L.; writing—original draft preparation, K.L. and Y.L.; writing—review and editing, X.D. and K.L.; visualization, K.L. and S.Q.; supervision, S.Q. and Y.L.; project administration, X.D. and K.L.; funding acquisition, X.D. All authors have read and agreed to the published version of the manuscript.

Funding

The project is sponsored by “Liaoning BaiQianWan Talents Program”, grant number 2018921080.

Data Availability Statement

The processed data required to reproduce these findings cannot be shared as the data also forms part of an ongoing study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jiang, G.; Zhao, C. A Review of EEG-based Brain-Computer Interface Development. Comput. Meas. Control 2022, 30, 1–8. [Google Scholar]
Wang, H.; Hu, J.; Wang, Y. A review of EEG signal processing methods. Comput. Age 2018, 13–15+19. [Google Scholar] [CrossRef]
Wang, D.; Tao, Q.; Zhang, X.; Wu, B.; Fang, J.; Lu, Z. Four Types of Expression-Assisted EEG Signal Recognition Methods Using Improved Cospatial Mode Algorithm. J. Xi’an Jiaotong Univ. 2022, 1–9. Available online: http://kns.cnki.net/kcms/detail/61.1069.T.20220822.1552.002.html (accessed on 30 October 2022).
Singh, A.; Hussain, A.A.; Lal, S.; Guesgen, H.W. A comprehensive review on critical issues and possible solutions of motor imagery based electroencephalography brain-computer interface. Sensors 2021, 21, 2173. [Google Scholar] [CrossRef] [PubMed]
Raza, H.; Chowdhury, A.; Bhattacharyya, S.; Samothrakis, S. Single-trial EEG classification with EEGNet and neural structured learning for improving BCI performance. In Proceedings of the IEEE International Joint Conference on Neural Networks, Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
Yong, Y.; Zhang, H.; Cheng, Q.; Sun, G.; Yang, J. Hybrid brain-computer interface and its research progress. Comput. Meas. Control 2020, 28, 9–13. [Google Scholar]
Zhu, J. Multi-Perspective Clustering Model for Epilepsy EEG Signals; Jiangnan University: Wuxi, China, 2021. [Google Scholar]
Arias-Vergara, T.; Klumpp, P.; Vasquez-Correa, J.C.; Nöth, E.; Orozco-Arroyave, J.R.; Schuster, M. Multi-channel spectrograms for speech processing applications using deep learning methods. Pattern Anal. Appl. 2021, 24, 423–431. [Google Scholar] [CrossRef]
Lopac, N.; Hržić, F.; Vuksanović, I.P.; Lerga, J. Detection of Non-Stationary GW Signals in High Noise From Cohen’s Class of Time–Frequency Representations Using Deep Learning. IEEE Access 2021, 10, 2408–2428. [Google Scholar] [CrossRef]
Khare, S.K.; Bajaj, V. Time–frequency representation and convolutional neural network-based emotion recognition. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 2901–2909. [Google Scholar] [CrossRef]
Xu, B.; Zhang, L.; Song, A.; Wu, C.; Li, W.; Zhang, D.; Xu, G.; Li, H.; Zeng, H. Wavelet transform time-frequency image and convolutional network-based motor imagery EEG classification. IEEE Access 2018, 7, 6084–6093. [Google Scholar] [CrossRef]
Liu, B.; Cai, M.; Bo, Y.; Zhang, X. A Feature Extraction and Classification Algorithm of Motor Imaging EEG Signal Based on PSO-CSP-SVM. J. Cent. South Univ. 2020, 51, 2855–2866. [Google Scholar]
Ji, J.; She, Q.; Zhang, Q.; Meng, M. Classification method of motor imaginative EEG signals based on optimal regional cospatial mode. Chin. J. Sens. Technol. 2020, 33, 34–39. [Google Scholar]
Shan, H.; Zhu, S. Brain-computer interface channel selection based on Relief-SBS. Chin. J. Biomed. Eng. 2016, 33, 350–356. [Google Scholar]
Feng, J.; Jin, J.; Daly, I.; Zhou, J.; Niu, Y.; Wang, X.; Cichocki, A. An optimized channel selection method based on multifrequency CSP-rank for motor imagery-based BCI system. Comput. Intell. Neurosci. 2019, 2019, 8068357. [Google Scholar] [CrossRef] [Green Version]
Jin, J.; Miao, Y.; Daly, I.; Zuo, C.; Hu, D.; Cichocki, A. Correlation-based channel selection and regularized feature optimization for MI-based BCI. Neural Netw. 2019, 118, 262–270. [Google Scholar] [CrossRef] [PubMed]
Varsehi, H.; Firoozabadi, S.M.P. An EEG channel selection method for motor imagery based brain–computer interface and neurofeedback using Granger causality. Neural Netw. 2021, 133, 193–206. [Google Scholar] [CrossRef] [PubMed]
Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Tangermann, M.; Müller, K.-R.; Aertsen, A.; Birbaumer, N.; Braun, C.; Brunner, C.; Leeb, R.; Mehring, C.; Miller, K.J.; Müller-Putz, G.; et al. Review of the BCI competition IV. Front. Neurosci. 2012, 6, 55. [Google Scholar] [CrossRef]
Brunner, C.; Leeb, R.; Müller-Putz, G.; Schlögl, A.; Pfurtscheller, G. BCI Competition 2008–Graz Data Set A. In Institute for Knowledge Discovery (Laboratory of Brain-Computer Interfaces); Graz University of Technology: Graz, Austria, 2008; pp. 1–6. [Google Scholar]
Schlögl, A.; Pfurtscheller, G. Dataset IIIa: 4-Class EEG Data. BCI Compet III. 2005. Available online: https://www.bbci.de/competition/iii/ (accessed on 4 October 2022).
Koles, Z.J.; Lazar, M.S.; Zhou, S.Z. Spatial patterns underlying population differences in the background EEG. Brain Topogr. 1990, 2, 275–284. [Google Scholar] [CrossRef]
Shovon, T.H.; Al Nazi, Z.; Dash, S.; Hossain, M.F. September. Classification of motor imagery EEG signals with multi-input convolutional neural network by augmenting STFT. In Proceedings of the 2019 5th International Conference on Advances in Electrical Engineering (ICAEE), Dhaka, Bangladesh, 26–28 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 398–403. [Google Scholar]
Wang, J.; Feng, Z.; Ren, X.; Lu, N.; Luo, J.; Sun, L. Feature subset and time segment selection for the classification of EEG data based motor imagery. Biomed. Signal Process. Control 2020, 61, 102026. [Google Scholar] [CrossRef]
Tang, X.; Li, W.; Li, X.; Ma, W.; Dang, X. Motor imagery EEG recognition based on conditional optimization empirical mode decomposition and multi-scale convolutional neural network. Expert Syst. Appl. 2020, 149, 113285. [Google Scholar] [CrossRef]
Chu, Y.; Zhao, X.; Zou, Y.; Xu, W.; Han, J.; Zhao, Y. A decoding scheme for incomplete motor imagery EEG with deep belief 18 network. Front. Neurosci. 2018, 12, 680. [Google Scholar] [CrossRef]
Li, D.; Wang, J.; Xu, J.; Fang, X. Densely feature fusion based on convolutional neural networks for motor imagery EEG classification. IEEE Access 2019, 7, 132720–132730. [Google Scholar] [CrossRef]
Lawhern, V.J.; Solon, A.J.; Waytowich, N.R.; Gordon, S.M.; Hung, C.P.; Lance, B.J. EEGNet: A compact convolutional neural network for EEG-based brain–computer interfaces. J. Neural Eng. 2018, 15, 056013. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhang, R.; Zong, Q.; Dou, L.; Zhao, X. A novel hybrid deep learning scheme for four-class motor imagery classification. J. Neural Eng. 2019, 16, 066004. [Google Scholar] [CrossRef] [PubMed]
Gaur, P.; Pachori, R.B.; Wang, H.; Prasad, G. A multi-class EEG-based BCI classification using multivariate empirical mode decomposition based filtering and Riemannian geometry. Expert Syst. Appl. 2018, 95, 201–211. [Google Scholar] [CrossRef]
Wu, H.; Niu, Y.; Li, F.; Li, Y.; Fu, B.; Shi, G.; Dong, M. A parallel multiscale filter bank convolutional neural networks for motor imagery EEG classification. Front. Neurosci. 2019, 13, 1275. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Liu, X.; Shen, Y.; Liu, J.; Yang, J.; Xiong, P.; Lin, F. Parallel spatial–temporal self-attention CNN-based motor imagery classification for BCI. Front. Neurosci. 2020, 14, 587520. [Google Scholar] [CrossRef] [PubMed]
Song, Y.; Wang, D.; Yue, K.; Zheng, N.; Shen, Z.J.M. EEG-based motor imagery classification with deep multi-task learning. In Proceedings of the 2019 International Joint Conference on Neural Networks, Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar]
Amin, S.U.; Alsulaiman, M.; Muhammad, G.; Mekhtiche, M.A.; Hossain, M.S. Deep Learning for EEG motor imagery classification based on multi-layer CNNs feature fusion. Future Gener. Comput. Syst. 2019, 101, 542–554. [Google Scholar] [CrossRef]
Ingolfsson, T.M.; Hersche, M.; Wang, X.; Kobayashi, N.; Cavigelli, L.; Benini, L. EEG-TCNet: An accurate temporal convolutional network for embedded motor-imagery brain–machine interfaces. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 11–14 October 2020; pp. 2958–2965. [Google Scholar]
Zhu, L.; Zhu, J.; Ding, W.; Yang, J.; Hu, Q.; Ying, N.; Xu, P.; Zhang, J. Feature extraction algorithm of motor imaging EEG signals based on kernel method and manifold learning. J. Sens. Technol. 2022, 35, 504–510. [Google Scholar]
Zhao, X.; Zhang, H.; Zhu, G.; You, F.; Kuang, S.; Sun, L. A multi-branch 3D convolutional neural network for EEG-based motor imagery classification. IEEE Trans. Neural Syst. Rehabil. Eng. 2019, 27, 2164–2177. [Google Scholar] [CrossRef]

Figure 1. 3 × 3 Illustration of sampling positions in standard and deformable convolution, (a) standard convolution of the sampling point arrangement rule (green dot), (b) the sensing field sampling position after the deformable convolution plus offset, and (c,d) are special cases of (b) showing that deformable convolution can cope with a variety of situations such as scale and rotation.

Figure 2. 3 × 3 Diagram of the deformable convolutional operation mechanism.

Figure 3. Traditional CNNs and DCNs have different sampling locations for the same area.

Figure 4. 3 × 3 Diagram of the deformable convolutional operation mechanism.

Figure 5. ResNet18 structure diagram.

Figure 6. DCN-ResNet18 neural network structure.

Figure 7. Identification block diagram.

Figure 8. Feedback motion imagination without recognition results.

Figure 9. Feedback motion imagination with recognition results.

Figure 10. Experimental paradigm timing diagram.

Figure 11. Example of an experimental cycle.

Figure 12. Time-frequency pattern of two-classification dataset, (a) imagining a time-frequency pattern of left-handed motion, (b) imagining a time-frequency pattern of right-handed motion.

Figure 13. Time-frequency pattern of four-classification datasets, (a) Imagining a time-frequency pattern of left-handed motion, (b) Imagining a time-frequency pattern of right-handed motion.

Table 1. Comparison of the structure of DCN-ResNet18 network and ResNet18 network.

ResNet18			DCN-ResNet18
Layer-Name	Output-Size	Module-Size	Layer-Name	Output-Size	Module-Size
conv1	112 × 112	7 × 7, 64, stride 2	conv1	112 × 112	7 × 7, 64, stride 2
conv2_x	56 × 56	3 × 3 max pool, stride 2	conv2_x	56 × 56	3 × 3 max pool, stride 2
conv2_x	56 × 56	$[\begin{matrix} 3 \times 3, 64 \\ 3 \times 3, 64 \end{matrix}] \times 2$	conv2_x	56 × 56	$[\begin{matrix} 3 \times 3, 64 \\ 3 \times 3, 64 \end{matrix}] \times 2$
conv3_x	28 × 28	$[\begin{matrix} 3 \times 3, 128 \\ 3 \times 3, 128 \end{matrix}] \times 2$	Conv3_x	28 × 2814 × 14	$[\begin{matrix} 3 \times 3, 128 \\ 3 \times 3, 128 \end{matrix}] \times 1$ $[\begin{matrix} D C N, 128 \\ D C N, 128 \end{matrix}] \times 1$
conv4_x	14 × 14	$[\begin{matrix} 3 \times 3, 256 \\ 3 \times 3, 256 \end{matrix}] \times 2$	Conv4_x	28 × 2814 × 14	$[\begin{matrix} 3 \times 3, 256 \\ 3 \times 3, 256 \end{matrix}] \times 1$ $[\begin{matrix} D C N, 256 \\ D C N, 256 \end{matrix}] \times 1$
conv5_x	7 × 7	$[\begin{matrix} 3 \times 3, 512 \\ 3 \times 3, 512 \end{matrix}] \times 2$	conv5_x	7 × 7	$[\begin{matrix} 3 \times 3, 512 \\ 3 \times 3, 512 \end{matrix}] \times 2$
	1 × 1	average pool fully connected layer softmax		1 × 1	average pool fully connected layer softmax

Table 2. The recognition accuracy of different network structures on dataset 2b (n represents the size of the convolutional kernel).

Residual Structure 3	Residual Structure 2
Residual Structure 3	n = 3	n = 5	n = 7
n = 3	90.3%	85.2%	83.4%
n = 5	86.4%	85.6%	84.8%
n = 7	85.1%	84.5%	83.1%