1. Introduction
In the future, BMI systems will have a wide range of applications, especially for patients who partially or entirely lose their mobility. For example, BMIs are used in the rehabilitation of post-stroke survivors whose brain damage impacts their mobility. In such implementations, the system captures brain signals, recognizes the user’s intention, and directly gives the command to external devices, e.g., exoskeleton, robotic hand, or other types of prosthetic devices [
1,
2]. BMI systems are also applied to patients who have completely lost their mobility by using brain signals during motion imagination (MI tasks) [
3,
4]. There are also different MI applications, such as controlling the robot manipulator [
5] or the lower-limb exoskeleton via brain signals [
6]. Recently, there have been attempts to expand the application of BMI systems into more complex environments, such as controlling vehicles or assessing the current mental state of drivers [
7].
Among the various recording paradigms for brain signals, electroencephalogram (EEG) is commonly employed. EEG is acquired by placing electrodes in multiple areas of the subject’s scalp [
8]. However, EEGs are significantly influenced by various sources of noise, leading to a non-stationary and highly variable nature of signals. This issue is crucial for developing robust BMI systems.
For such implementations of BMI systems, the following issues must be considered:
Improving the recognition accuracy of BMI systems to decode brain signals with high accuracy. This will improve the implementation of BMI systems, especially in applications such as prostheses and gadgets.
Real-time implementation of BMI systems requires a short processing time of captured brain signals in addition to high accuracy. The processing time is strongly related to the number of data and processing methods.
To develop BMI systems for real-life applications, the accurate mapping of EEG data obtained from subjects into their intended movements is essential. Previous methods have applied k-NN (k-nearest neighbor) [
9] and support vector machine (SVM) [
10] for EEG classification. In [
11,
12], k-means and k-median core-set data selection methods are utilized, in which a representative subset of points is used to speed up the learning process. In [
13], the authors show that many “unforgettable” examples that are rarely incorrectly classified once learned could be omitted without impacting generalization. In [
14], the authors proposed a Selection via Proxy (SVP) method to improve the computational efficiency of active learning and core-set selection in deep learning. In this method, cheaper proxy model representation is substituted with an expensive model during data selection. In [
15], the authors proposed a data selection strategy to train a neural model to obtain the most significant data information and improve algorithm performance during training, leading to less computations and a reduction in classification error.
More recently, CNNs have gained popularity for EEG classification due to automatic feature extraction capabilities [
16,
17,
18]. While deep learning offers high accuracy, it comes with the drawbacks of long training time and a substantial amount of required training data. A typical approach for BMI applications is to reduce information redundancy through data optimization techniques such as channels optimization [
19]. Nonetheless, this approach does not solve the issue of generalization but rather finds the optimal channels, which can be different among subjects. Transfer learning emerges as a solution to address these challenges [
20,
21]. In [
22], the authors analyze if specific layers are general or specific to one model. In addition, similarity between tasks is quantified. The results show that the transferred weights of frozen and fined-tuned layers perform better than random weights, especially for good generalization.
Different to previous works, we combine GAs and transfer learning to select training data, such as to improve the recognition rate and training time of BMI systems in real-time applications. In this work, a GA is employed to select the optimal data for training the BCNN. This ensures that after transfer learning, the TCNN achieves a superior recognition rate and shorter training time. The evaluation is conducted on two datasets: The first is the BCI Competition IV [
23], an online dataset collected from nine subjects performing MI tasks. The second dataset is collected from subjects in our laboratory at Hosei University, involving several grasping motions of the right hand or motor movement (MO) tasks.
We investigate various aspects of transfer learning, including the choice of pre-trained models, the selection of transfer layers, and the fine-tuning process. We explore the trade-offs between model complexity and performance, aiming to strike a balance that facilitates accurate and efficient BMI operation. Our findings contribute to the development of more practical and data-efficient BMI systems, which hold great potential for improving the quality of life of individuals with disabilities and expanding the scope of brain-controlled applications. The proposed algorithm can reveal similarities between the data selected by the GA and the data used to train the TCNN through transfer learning. In addition, we utilize the trained CNNs to map brain signals in real time to the robot motion for human–robot interaction applications. Robot implementation shows good performance, with potential for a wide range of applications.
2. Method
The flowchart of the proposed method is shown in
Figure 1. First, a single subject’s EEG data, called the target data, are separated from the EEG dataset of all subjects. The separated EEG data are used to train the TCNN. In our model, the GA selects the subjects whose EEG data will be used to train the BCNN. Initially, an initial population of individuals is generated, each characterized by a set of genes serving as the binary data of each subject to be included or not in the training of the BCNN (
Figure 1). Using the selected EEG data by the GA, we initially train the BCNN. Then, we implement transfer learning utilizing the pre-trained BCNN as the base model. The criterion for the selection of the dataset by the GA is to maximize the average classification accuracy of the TCNN. Based on this evaluation criteria, the best individuals are selected, and crossover and mutation operations are performed to generate the next generation. This iterative process is repeated for a predetermined number of generations, and the individual with the highest fitness is generated.
2.1. Convolutional Neural Network (CNN)
DNNs are a category of Neural Networks that artificially replicate the human brain. DNNs, unlike CNNs with only a few intermediate layers, consist of many intermediate layers, enabling them to handle more complex problems.
On the other hand, a CNN is a type of DNN that incorporates repeatedly both convolutional and pooling layers to extract features from the training data. In this work, EEGNet [
9] is utilized for both the BCNN and TCNN, as shown in
Figure 2. The network comprises two primary convolution blocks for spatial–temporal fusion and a simple dense layer for classification. The first block includes two convolutional layers: a normal convolution layer for temporal feature extraction and a depth-wise convolutional layer for frequency-specific spatial information. The subsequent block employs a separable convolutional layer to fuse both spatial and temporal information. Additionally, this block utilizes the ReLU activation function, average pooling, and a dropout layer.
2.2. Transfer Learning
In our method, transfer learning is implemented by transferring parts of the pre-trained BCNN model to train the TCNN. To improve accuracy, CNNs typically begin with randomly initialized parameters in the initial learning phase. However, due to the random selection of initial connection weights, a long training process is required. Therefore, transferring a set of previously trained connection weights can be advantageous for faster training with less data of the TCNN, as shown in
Figure 3.
There are two approaches to transfer learning: (1) fine-tuning, which involves updating the model for the new dataset, and (2) feature extraction, which involves preventing the update of the model during training, using only the model’s classification head. In our implementation, we use fine-tuning.
Given the need for the model to learn complex data and recognize a wide range of patterns, the BCNN model is initially trained with the EEG data of subjects selected by the GA. Next, the convolutional layers’ parameters of the BCNN model are transferred to the TCNN model, which is then fine-tuned with the target subject dataset. This approach results in rapid parameter adjustment and often achieves a better recognition rate compared to training the model from scratch.
3. Datasets
Due to the diverse tasks in BMI applications, to evaluate the performance of the proposed method, two different datasets—MCRP and MI tasks—are utilized. In MRCP tasks, EEG signals correspond to the movement of subjects’ limbs. Conversely, in MI tasks, EEG signals are recorded during the imaginary movements of limbs. This section provides a detailed explanation of how the datasets are created to train and test the performance of the proposed method.
3.1. BCI Competition IV2a
The dataset, publicly available online, was collected by Graz University of Technology. The data are recorded from nine subjects who engaged in four motor imaginary tasks: right hand, left hand, tongue, and both feet. Data from each subject were gathered over 6 sessions, resulting in a total of 48 sessions (12 sessions for each motor imaginary class). Throughout the experiments, subjects were seated in front of a computer screen.
Each trial began with a cross appearing on the black screen at t = 0 s, followed by a warning sound, as depicted in
Figure 4. After 2 s (t = 2 s), a cue in the form of an arrow pointing left, right, down, or up was shown on the monitor corresponding to one of the four motor classes. The arrow remained on the screen for 1.25 s. This instructed the subjects to perform the target motor imagery task until the cross vanished at t = 6 s. EEG recording utilized twenty-two Ag/AgCl electrodes spaced 3.5 cm apart, sampled at 250 Hz, and band-pass filtered between 0.5 Hz and 100 Hz.
3.2. CapiLab MCRP Dataset
In our laboratory, data are collected following the Ethics Policy on Research Involving Humans of Hosei University based on the Declaration of Helsinki.
We gathered EEG data from four subjects aged 20–22. They were seated in front of a laptop, as shown in
Figure 5a. The MCRP tasks were grasping a ball, smartphone, and a pen. In addition, a no-motion task was added in which the subjects just rest without grasping any object. The data acquisition method is illustrated in
Figure 5b. Initially, animated objects depicting a ball, a smartphone, and a pen were displayed on the upper part of the monitor to remind the subjects to pay attention to the upcoming task. On the lower part of the monitor was the animated hand. As the animated object moved down toward the animated hand, simultaneously, a real object matching the animated one reached the subject’s right palm. Once the animated object stopped moving, the grasping motion of the fingers started on the monitor, which lasted 2 s. Simultaneously, the subjects grasped the real objects.
The electrode arrangement was according to the international 10–20 method. We employ a Mitsar-EEG-201 system to record the user’s brain activity with 19 recording channels. The device’s recording frequency was 500 Hz. Because each hand motion was recorded for two seconds, each trial generated a 1000 × 19 data point matrix. A low pass filter of 0.16 Hz and a high pass filter of 40 Hz were applied. To reduce electric line noise, a notch filter was installed at 50 ± 5 Hz.
4. Results and Discussion
In our implementation, we divided the collected data into training data (90% of the total data), and the remaining 10% of the data were used for validation. The GA and CNN parameters are shown in
Table 1 and
Table 2, respectively. The process began by selecting one subject’s data for the TCNN. The GA then determined the dataset from the remaining subjects to train the BCNN. In our implementation, the average recognition accuracy of the TCNN after transfer learning is the fitness function of the GA. If the GA selected zero subjects, training was skipped, and the accuracy was set to −0.5.
We evaluated similarity using correlation coefficients and Euclidean distances. Supposing that
x and
y are two different signals, cross-correlation is calculated based on the following equation:
where * denotes a complex conjugate operation. Another similarity evaluation is Euclidian distance, which is calculated as follows:
4.1. Results ofr BCI Competition IV2a Dataset
Because the data of Subject 4 were damaged, we removed these data from the dataset. So, the data of eight subjects were used to evaluate the method for MI tasks. Subject 8 was selected as the target subject. Therefore, the data of Subject 8 were selected to train the TCNN using transfer learning. The GA selected the best data from the remaining seven subjects to train the BCNN.
The GA population size was 50 individuals, evolving through 30 generations.
Figure 6 shows the average fitness value (recognition accuracy after transfer learning) and the standard deviation for each generation through the course of evolution. The GA converged after 20 generations with a consistent increase in the objective function for each generation.
Table 3 shows the values of correlation coefficient and Euclidian distance to measure the similarity between the data of the subjects selected by the GA (Subjects 2, 3, and 7) and the target subject (Subject 8).
4.2. Results of CapiLab Dataset
In these experiments, Subject 4 is selected as the target subject. The GA selects the best subjects to train the BCNN from the remaining three subjects. There are 600 data for each subject (150 for each class, 19 channels × 1000 data points), resulting in a total of 2400 data/subjects. We reduced the GA population size to 20 individuals, and the termination criteria to 20 generations. The reason for having a smaller number of generations and individuals is the small number of subjects the GA searches through.
Figure 7 shows the average fitness for each generation through the course of evolution. The objective function increased fast during the first generations and converged by the tenth generation. This is because the search space of the GA is small (three subjects).
Table 4 presents the similarity between the subjects selected by the GA and those trained by the TCNN.
Table 4 demonstrates that the subjects chosen by the GA exhibited the highest similarity to Subject 4, both in terms of the correlation coefficient and Euclidean distance.
However, there is an issue requiring further investigation. As shown in
Table 3 and
Table 4, only two out of seven results agreed on the similarity criteria for the BCI competition dataset, and one out of three agreed for the CapiLab dataset. We attribute this to the convolutional operation of EEGNet, which employs depth-wise convolution for spatial–temporal information fusion, combining temporal and spatial similarities. Another issue is the filtering technique that is applied during data preprocessing. Some of the spatial information was lost due to the applied filters.
4.3. Comparison of CNNs Recognition Rates Trained with All and GA-Selected Subject Data
Table 5 shows the recognition accuracy in the case of (1) all the subjects’ data being used to train the BCNN; (2) data of the subjects selected based on the highest correlation coefficients (Subjects 3, 4, and 6 for BCI Competition IV and Subject 3 for CapiLab); (3) data of the subjects selected based on the lowest Euclidian distances (Subjects 2, 3, and 6 for BCI Competition IV and Subject 1 for CapiLab); and when only the data of the subjects selected by the GA were used. The results show that employing the GA to optimize the training data resulted in an improvement in recognition accuracy of approximately 11% and about 4% for the BCI Competition IV and Capilab dataset, respectively. The reduction in the size of the training dataset significantly reduces training time, demonstrating the feasibility of the proposed method in selecting the best subjects’ data.
4.4. Real-Time Robot Control Using Brain Signals
In the case of the Capi-lab dataset, we verified the performance of the TCNN in real-time control of the robotic arm. The experimental setup is shown in
Figure 8. The trained TCNN (Subject 4) recognizes the object grasped by the subject using brain signals, and the robot manipulator moves to the target object placed on the table. The subject was wearing a gel-based EEG cap, which was connected to the EEG amplifier. In our implementation, we utilized a KINOVA Gen 3 lightweight robot manipulator. The response time of the robot is 2.075 s, which is 2 s for brain data collection and 0.075 s for the TCNN to recognize the grasping object. For the experiment, we limited the speed of robot motion for safety reasons. So, the robot reached the target object in around 2.4 s.
Figure 9 shows the video capture of three different trials. The recognition rate decreases to nearly 60% due to slight changes in electrode positions. Therefore, collecting more training data is necessary to improve the robustness of the developed BMI system. Such implementation increases the applicability of BMI systems for human–robot interaction (HRI) tasks.
4.5. Discussion
The proposed method outperformed other data selection methods and makes accurate recognitions with a reduced amount of data. The experimental results demonstrated significant improvements in the recognition accuracy of the TCNN when the BCNN was trained with the GA-selected data. Moreover, beyond accuracy gains, the GA-selected training data contributed to a notable reduction in CNN training time. This efficiency improvement is particularly crucial for real-time robotic applications where response time is essential. The findings from this work not only underscore the effectiveness of the proposed approach but also highlight its potential for practical implementations. By reducing the number of training data, the proposed method minimizes the computational burden of CNNs. This work can be further extended for the exploration and refinement of hybrid techniques in BMI systems, contributing to advancements in neurotechnology and human–machine interfaces.