**4. Results and Discussion**

In this section a validation of the proposed method is made, some relevant hyper-parameters of the CNN architecture are explored, and the classification results are compared with other reported methods.

#### *4.1. Validation of Proposed Method*

The first experiment was carried out to compare the performance between two BSS approaches: SOS and HOS. To evaluate the proposed methodology, an SOS called Robust Second Order Blind Identification (SOBIRO), and the previously mentioned fastICA, belonging to the family of HOS algorithms were tested in one subject. In Figure 5 is showed the train and validation values for 20 epochs in cases Figure 5a without BSS preprocessing, Figure 5b with fastICA without the sort criterion, Figure 5c with sorted SOBIRO, and Figure 5d with sorted fastICA. Considering the structure of the database used in [32], 50% trials were set as the training set, and the remaining 50% were selected as the test set. The CNN architecture initially used was the same proposed in [32]. Nevertheless, the input data is organized in a different way, for which it was necessary to make some adjustments in the convolution stride and max-pooling size. The CNN architecture used in this work is showed in Table 1.


**Table 1.** CNN modified architecture inspired from [32].

In the graphs it is possible to observe that in case (a) without preprocessing BSS stage, the maximum accuracy value for train is near 0.8, while the validation accuracy is below 0.6. These results can be explained taking into account the reduced number of data for a deep learning approach, where large amounts of data are required to achieve an end-to-end system, where convolutional layers are able to find the determining patterns that allow for classifying movement intentions; In the case (b), where fastICA is applied in each trial but without the sort step, the training accuracy reaches values close to 0.98, but the validation accuracy is below 0.60 and decrease for each epoch. This can be explained by the disorder of the estimated sources in each trial, where CNN learns training set, but fails to generalize in the test set. This result validates the hypothesis of the need to use a sort criterion for sources estimated through BSS; In case (c), using the second-order Statistic approach (SOS) SOBIRO as BSS algorithm and sorted with the same explained criterion in Figure 2, the test accuracy achieved 0.73 values, improving the (a) and (b) responses. However, differences between

train and test are considerably large which indicates overfitting; Finally, in case (d), with the sorted HOS fastICA, both the training data and the validation data achieve values higher than 0.8, reducing the phenomenon of overfitting. This result validates several previous works where the superiority of the HOS algorithms over the SOS for BSS is reported. Results (c) and (d) are in accordance with numerous works reporting superiority of HOS-BSS algorithms over SOS-BSS algorithms for EEG preprocessing [45–47]. Sorted fastICA is then chosen as BSS algorithm before CWT generation and posterior CNN classification stages.

**Figure 5.** Train and test validation behaviour for subject *aa* in cases. (**a**) without BSS; (**b**) with no sorted fastICA; (**c**) with sorted SOBIRO; and (**d**) with sorted fastICA (30 epochs). with the sorted HOS fastICA, both the training data and the validation data achieve values higher than 0.8, reducing the phenomenon of overfitting in comparison to the other cases.

#### *4.2. Comparison with Other Methods*

In Table 2 are shown the validation accuracy for each *k* validation and each subject of dataset IVa of BCI competition III. The test set was divided into 10 equal parts for each cross validation. The maximum classification value was chosen in each case.

As is shown in Table 2, the average ranking percentage of 94.66%, with a standard deviation *σ* of 6.46. For the five subjects, the maximum k-fold accuracy average was 97.81% with *σ* of 3.34 in subject *aa*, and a minimum value of 92.18 with *σ* of 6.98 in subject *ay*. Table 3 shows some recent work that reports the same used dataset [48–51].


**Table 2.** 10-fold cross validation accuracy.

**Table 3.** Comparison with other works using the IVa of BCI competition III dataset.


#### *4.3. Discussion*

One of the main contributions of this work is the criterion of sorting the estimated sources. In Figure 6 are shown the spectral components of estimated sources before and after apply the sort criterion in one trial. The components with more information in *μ* and *β* will generally be placed in the top positions, while the components least associated with these frequencies will be at the bottom as for example in Figure 6a, the first spectral component has more energy in frequencies over 20 Hz but without *α* and *β* contribution, is placed in the bottom in Figure 6b. In contrast, the component located in the seventh position in Figure 6a, which has the most energy in the region *μ*, is located in the first position in Figure 6b.

**Figure 6.** Analysis spectral components: (**a**) before sort; (**b**) after sort. The components with more MRIC frequencies are placed at the top after sorting.

The other distinctive part of proposed method is the use of a CNN architecture, and the adjustment of some relevant hyper-parameters. Taking as initial values those shown in Table 1, and changing the kernel size along height and width for each convolutional layer, the behaviour of validation accuracy for each case is analyzed. The kernel size is (y,x), where *y*-axis contain frequential and spatial information, while *x*-axis contain temporal information. In Figure 7 are depicted the validation accuracy for subject *aa*. The kernel sizes selected to the comparison in the first convolutional layer were (*i*, 1), (*i*, 3), (*i*, 5), (*i*, 7), with *i* taking odd values from 1 to 9.

**Figure 7.** Analysis of kernel size in first convolutional layer: (**a**) (i,1); (**b**) (i,3); (**c**) (i,5); and (**d**) (i,7). The kernel size (7,5) in (**c**) presented less overfitting and major classification accuracy.

According to the analysis of kernel size in the first convolutional layer, the size (7,5) (Figure 7c) presented less overfitting and major classification accuracy. It is also noted that when the kernel has a size of in the size 3 for *y*-axis, while reaching a maximum value close to the best case, this kernel size generates a high overfitting after certain times. On the other hand, the the size 1 for *y*-axis generates the lowest maximum values in all combinations of *x*-axis.

These results are in accordance with other work where the size of the kernel is also studied, where they report that the vertical locations (frequency-space) is of great importance for the classification performance, while, in contrast, the horizontal locations (time) are not as significant [27].

Once fixed the kernel size in first layer, a similar analysis was made for the kernel of the second convolutional layer. In Figure 8 are depicted the validation accuracy for subject *aa*. Taking into account that a (4,4) max-pooling has been previously applied, the kernel sizes selected were (j,1), (j,2), (j,3), and (j,4), with *j* taking values from 1 to 5.

**Figure 8.** Analysis of kernel size in second convolutional layer: (**a**) (j,1); (**b**) (j,2); (**c**) (j,3); and (**d**) (j,4). The kernel size (1,1) in (**a**) presented less overfitting and major classification accuracy. The only *y*-axis where the CNN can achieve a stable accuracy throughout the epochs is 1.

At this case, the only *y*-axis where the CNN can achieve a stable accuracy throughout the epochs is 1 independently of *x*-axis size. In other cases, the validation accuracy decreases to 60% or even around 50% in case *y*-axis = 5.

As is well known, currently deep learning approaches are state-of-the-art in many images processing and artificial vision. However, in contrast with two-dimensional static images, the EEG signals are dynamic time series, where the generalizable MI patterns in EEG signals are spatially distributed and mixed in the channels around the motor region. In addition to this, low signal-to-noise ratio could make learning features in an end-to-end fashion more difficult for EEG signals than for images [29]. On the other hand, deep learning approaches require a large amount of training data in order to obtain descriptors that allow discrimination between different classes. In particular case of EEG and MI, this is a limitation since the data must be processed independently for each subject, and due to fatigue, the MI databases are relatively small. To deal with this problem, some works proposed using deep learning have done data augmentation using some criteria to generate simulated data from the training set. This approach has yielded good results. However, the generation of artificial data can be risky without a rigorous methodology and thus generate false data that increases accuracy for a particular dataset.
