Next Article in Journal
Multimodal Breast Phantoms for Microwave, Ultrasound, Mammography, Magnetic Resonance and Computed Tomography Imaging
Next Article in Special Issue
Towards Optical Imaging for Spine Tracking without Markers in Navigated Spine Surgery
Previous Article in Journal
Performance Evaluation of UAV-Enabled LoRa Networks for Disaster Management Applications
Previous Article in Special Issue
Intraretinal Fluid Pattern Characterization in Optical Coherence Tomography Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analyzing the Effectiveness of the Brain–Computer Interface for Task Discerning Based on Machine Learning

1
Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Narutowicza 11/12, 80-233 Gdansk, Poland
2
Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Narutowicza 11/12, 80-233 Gdansk, Poland
3
Audio Acoustics Laboratory, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Narutowicza 11/12, 80-233 Gdansk, Poland
*
Author to whom correspondence should be addressed.
Sensors 2020, 20(8), 2403; https://doi.org/10.3390/s20082403
Submission received: 12 March 2020 / Revised: 15 April 2020 / Accepted: 21 April 2020 / Published: 23 April 2020
(This article belongs to the Special Issue Sensors for Biomedical Imaging)

Abstract

:
The aim of the study is to compare electroencephalographic (EEG) signal feature extraction methods in the context of the effectiveness of the classification of brain activities. For classification, electroencephalographic signals were obtained using an EEG device from 17 subjects in three mental states (relaxation, excitation, and solving logical task). Blind source separation employing independent component analysis (ICA) was performed on obtained signals. Welch’s method, autoregressive modeling, and discrete wavelet transform were used for feature extraction. Principal component analysis (PCA) was performed in order to reduce the dimensionality of feature vectors. k-Nearest Neighbors (kNN), Support Vector Machines (SVM), and Neural Networks (NN) were employed for classification. Precision, recall, F1 score, as well as a discussion based on statistical analysis, were shown. The paper also contains code utilized in preprocessing and the main part of experiments.

1. Introduction

The spontaneous electrical activity of the brain acquired from electrodes placed on the human scalp in a noninvasive manner is extensively explored in many areas of interest, to name a few: neuroscience, cognitive science, emotion recognition, gaming experience, etc. [1,2]. Research on the brain–computer interface (BCI) was primarily motivated by supporting interaction with the environment of disabled people [3,4,5]. Moreover, examples such as detecting and classifying epileptic seizures based on EEG signals [6], controlling driver fatigue [7], sleep disturbance detection [8], recognizing different mental states [8,9], etc. are of great importance.
The practical implementation of the brain–computer interface (BCI) systems uses electroencephalographic (EEG) signals [7,10,11,12]. In BCI systems, the recorded signal is preconditioned in order to eliminate the artifacts and interferences, among others, resulting from eye blink, eye movement, muscle activity, or signal drift due to electrode misplacement [1,13,14,15,16]. Optionally, the signal can also be subjected to a blind source separation procedure. Such methods as Independent Component Analysis (ICA) are used for this purpose [17,18,19,20,21,22,23,24,25,26]. Then, extraction of features, i.e., reduction of the signal to a vector of parameters of lower dimensionality, is performed [27,28,29]. Such a reduction enables to distinguish signals representing different types of mental activity that the BCI system is to recognize [10,30]. However, in deep learning classification, feature extraction is not always applied as signal characteristics may be automatically derived from autoencoders [31,32]. Moreover, Wu et al. proposed an experimental scenario in which the feature selection and classification were performed simultaneously [33]. The method proposed was applied to the high-dimensional setting with the number of features larger than the number of samples [33]. Finally, machine learning methods, including both baseline algorithms such as k-Nearest Neighbors (k-NN), Random Forest [34], or Support Vector Machine (SVM) [35,36], as well as deep learning methods [37,38,39,40,41,42,43] are extensively employed in discerning mental state or classifying brain activity. Overall, it is evident that a hybrid approach is needed to classify the mental state regardless of the application area. Therefore, the most challenging issues related to recognizing mental states based on the recorded EEG signal are the selection of signal analysis and classification methods. In the most recent survey by Gu et al. [44], one may find references to BCI contributions to several fields of research and applications. A table containing an overview of EEG devices with their characteristics is given with adequate references. This survey presents a comparison between deep learning neural networks and traditional machine learning methods to prove the recent improvement of current deep learning algorithms in the EEG analysis. Overall, several topics are addressed by Gu et al., i.e., advances in sensors and sensing technologies, characteristics of signal enhancement and online processing, recent machine learning algorithms and the interpretable fuzzy models for BCI applications, state-of-the-art deep learning algorithms and combined approaches for BCI applications, and the evolution of healthcare systems and applications in BCIs [44]. Further, artifact removal techniques from the EEG signal are discussed along with the EEG signal analysis in real-time. Equally valuable, comprehensive, and thorough is a review prepared by Zhang et al. [35]. The focus of this survey is on advancement in applying deep learning to BCI as well as showing new frontiers. An important aspect of this review is to show details concerning EEG signal types under classification, along with the classification methods employed. Indeed, one should refer to this survey as it comprises a systematic review of brain signals and deep learning techniques for BCI. The paper discusses the popular deep learning techniques and state-of-the-art models for BCI signals, reviews the applications and remaining challenges of deep learning-based BCI, and finally, highlights some promising directions for future research. It is interesting to read also a survey source from 2010 [45], in which the impact of various events, namely, sleep, epilepsy, reflexology, drugs/anesthesia, diabetes, meditation, music, and artifacts, on the EEG signal is given. One of the most important topics contained in both surveys is related to transfer learning methodologies, which may be crucial in exploiting knowledge acquired to enhance the classification performance [35,44].
The survey by Zhang et al. examines 232 literature sources [35], and Gu et al. [44] provides 209 references; Google search returns a plethora of publications related to EEG-based BCI, thus it is not possible to follow all the threads presented. However, an attempt to recalling some works from the literature is made herein to include some selected sources to show that there does not exist one way of dealing with the EEG signals in terms of preprocessing, feature extraction (if any strategy applied), classification scheme, etc. On the basis of such a recollection, one may easily see the limitations of their own study and treat it as a starting point for future research directions.
Examples of the EEG-based classification performance obtained for various application tasks are given in Table 1, including the literature resources recalled in the survey by Zhang et al. [35] and Gu et al. [44] as well as some retrieved from other publications.
For each study carried out, we have chosen in part, a classical approach to classification of the EEG signals (i.e., feature extraction/learning algorithm), and a deep learning model. To compare both approaches, the EEG signals acquired at our laboratory were utilized. We are aware that there exists a great number of datasets available to the public, examples of which are included in [49,51,53,61,66,67,68,69,70,71,72,73,74,75,76,77,78], and they could be employed, e.g., as test data or in transfer learning applied to deep learning. However, many of the cited works are also exploratory in their character [7,9,47,55,63], they include a variety of datasets, signal acquisition methods, data formats, etc., which cannot be directly compared to the outcome of the study performed by us. Therefore, we have decided to acquire our own locally acquired data, especially as the experiments also served other purposes.
The aim of the study presented is to create a practical framework for the automatic classification of mental states. It comprises both signal analysis and several selected classification algorithms. The classification schemes are compared as to their overall effectiveness of the automatic classification of mental states. For this purpose, EEG signals from 17 people in three different mental states—relaxation (called meditation), excitation (called music video), and solving logical task (called logic game)—are collected using an Emotiv EPOC+ helmet [79]. These raw signals were acquired from a set of standard positions: AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4, according to the 10–20 (10%) extended electrode configuration on the scalp [80,81,82]. The acquired signals are separated by means of independent component analysis (ICA). For the extraction of features from the signals, the Welch method (for estimation of power spectral density (PSD) of a given time sequence), autoregressive modeling (Burg algorithm), and discrete wavelet transform (DWT) are selected. Such an approach is seen in many other literature sources [35,44,45,83]. The obtained feature vectors are reduced by Principal Component Analysis (PCA). For completing the EEG signal processing framework for classifying mental states, three classification methods are used: k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), and Neural Network (NN), belonging to the category of deep learning. As pointed out in the survey of Zhang et al. [35], the recent advances in frontiers of deep learning-based BCI refer mostly to deep learning techniques, which is why in the classifiers employed in the carried out study, an NN was also included. However, it should be noted that this a simple model with three hidden layers and the LeakyReLU activation function is adapted in our study.
The organization of this work is as follows. The following Section describes the dataset building and preprocessing to which the signals are subjected. Section 3 contains a thorough presentation of experiments, which consists of the EEG-based signal classification. Details regarding the technique used to reduce the dimensionality of feature vectors, given classifier settings and results obtained, are discussed. For performance evaluation, two schemes are executed: In the first one, an 80/20% split of the dataset into training/test sets is produced for k-NN and SVM, and a 70% training set, 10% validation set, and 20% test set for the NN algorithm. Moreover, 10-fold cross-validation for a more reliable assessment of classification performance is carried out on the best and the worst outcomes of the first validation scheme. This allowed us to check that the model can be trained repetitively with a similar result regardless of the choice of examples for training [84]. For each classifier performance, precision, recall, and F1 score are shown. Moreover, statistical analysis is performed for the experiments, resulting in appropriate metrics as well as indicating whether the differences obtained for two validation schemes are statistically significant. The paper also contains observations on limitations of the investigation carried out and possible ways to overcome them, as well as conclusions resulting from the conducted research. The prepared code snippets are contained in Appendix A and an attached zip file.

2. Materials and Methods

EEG signals of 17 subjects participating in the experiment were acquired. In the first stage of the research, the participants were instructed to relax. In the second phase, subjects watched the music video. In the last stage, subjects played a game involving logical thinking. For a given subject, durations of all stages were equal but varied between subjects. An Emotiv EPOC+ device equipped with 14 measuring electrodes was used to acquire the signals [79]. The sampling frequency was set to 128 Hz.
The article contains snippets of Python [85] code to illustrate performed operations. They are simplified versions of the code used for calculations. These snippets are contained in Appendix A; the code is also available to interested parties (see Supplementary Materials for the online address). The flowchart of the study performed is shown in Figure 1.

2.1. Building the Dataset

For each subject, the last 50 s of recorded signals, as well as 50 s of signals recorded between successive stages, were discarded. The remaining signals were divided into 1 s frames with a 0.5 s overlap. Thus, a single frame has the form of a matrix with dimensions (12,814). Each frame is assigned the corresponding category: meditation, music video, or logic game. The final number of frames was 24,795, i.e., 8265 for each category.
Overlap means that, for a given subject, the l last samples of ith frame of a given category from a given channel have the same values as the l first samples of i + 1 frame of that category and from that channel. The purpose of using overlap is dataset augmentation.

2.2. Data Preprocessing

For each frame, mean values and variances of each of 14 channels were calculated, giving 28 values per frame. They were saved for later use. Afterward, each channel of every frame was detrended using the scipy.signal.detrend function. Then, every frame was whitened and subjected to independent component analysis (ICA [17]) using the FastICA algorithm (see Appendix A).
Subsequently, for each channel in each frame, features were computed using feature extraction schemes described further on. Then, the feature vectors corresponding to subsequent channels were concatenated into one feature vector. Finally, previously computed mean values and variances were attached to the feature vector.
  • ar16: for each channel of every frame, 16th order autoregressive models were computed using the Burg algorithm. The arburg function from the spectrum library was used for that. Only the real values of computed model coefficients were utilized (imaginary values were all equal to 0). After concatenating model coefficients from all channels with previously computed mean values and variances, final feature vectors of 252 elements were obtained. The code employed for the aforementioned calculations is contained in Appendix A.
  • ar24: like ar16, but autoregressive models were of the 24th order. The final feature vectors contained 364 elements.
  • welch16: for every channel in every frame, an estimate of power spectral density (PSD) was computed using the Welch method. Function welch from the scipy library was used for that. Samples from each channel in every frame were divided into eight nonoverlapping subframes of 16 samples each. Subsequently, nine coefficients were obtained per every channel. Final feature vectors (with pre-computed mean values and variances) contained 154 elements. Calculations were conducted with the use of the code shown in Appendix A.
  • welch32: like welch16, but frames were divided into four nonoverlapping subframes of 32 samples each. Final feature vectors consisted of 266 elements.
  • welch64: like welch16 and welch32, but frames were divided into two nonoverlapping subframes, each of 64 samples per channel. Final feature vectors consisted of 490 elements.
  • dwt: each channel of every frame was decomposed using the 4th level discrete wavelet transform with db4 wavelet using the wavedec function from pywt library. The resulted vectors contained 14, 14, 22, 37, and 67 coefficients, respectively. After concatenating coefficient vectors with pre-computed mean values and variances, final feature vectors of 2184 elements were obtained. The transcription of this algorithm is provided in a listing contained in Appendix A.
  • dwt_stat: each channel of every frame was decomposed with the discrete wavelet transform as in the dwt scheme. Subsequently, for each of five wavelet coefficient-based vectors, the following descriptive parameters are computed; mean value, mean value of absolute values, variance, skewness, kurtosis, zero-crossing rate, and the sum of squares (see Appendix A for the code used).
Dimensionalities of feature vectors obtained with the aforementioned schemes were reduced via principal component analysis (PCA). For each set of features derived from the training dataset, PCA was performed, retaining 95% of the variance in training data set features. Then, validation and test data were projected to PCA, which was written in Python (see Appendix A).

3. Experiments, Results, and Discussion

Experiments were carried out in order to compare the accuracy of test data classification using selected methods of feature extraction and classification. All computations were performed with the Python 3.5 programming language. The most important libraries used are scikit-learn, TensorFlow, and Keras [86,87,88].
First, the obtained dataset was randomly divided into training data, validation data, and test data in proportions of 70%, 10%, and 20%, respectively. The code snippet is shown in Appendix A.
It should be noted that for each feature extraction scenario, two different schemes were computed. In the case of k-NN and SVM classifiers, the validation step was omitted, and validation data were used for training. Thus, PCA was performed on a total of 80% of available data for the k-NN and SVM classifiers, and 70% of available data for neural networks. After dimensionality reduction, the lengths of feature vectors for each scheme amounted to
  • ar16: 38 in both cases,
  • ar24: 61 in both cases,
  • welch16: 61 in both cases,
  • welch32: 110 in both cases,
  • welch64: 204 in both cases,
  • dwt: 1019 for k-NN and SVM, 1016 for neural networks, and
  • dwt_stat: 136 in both cases.
Moreover, 10-fold cross-validation was executed to estimate further how the model is expected to perform on unseen data. These results are shown for comparison with the training data/validation/test scheme, but only for the best/worst feature extraction method/classifier variants.

3.1. Experiment 1: k-Nearest Neighbors

In the first experiment, k-NN classifiers were trained for chosen values of k using 80% of available data. The remaining 20% of data was used for testing. Accuracy was used as an effectiveness measure. Precision, recall, F1 score, and confusion matrices were used as auxiliary score measures. Code snippets for training classifiers, test data classification, and computing score measures are shown in Appendix A.
The results obtained in this experiment are presented as a summary in Table 2, and a discussion carried out through this Section. The best individual scores for the given feature extraction scheme and best mean score from all feature extraction schemes for a given k value are highlighted in bold.
In the conducted experiment, the highest classification accuracy of 63.86% was achieved for the welch32 scheme combined with the value of k = 11. Likewise, mean classification accuracy was also highest for the welch32 scheme. In general, schemes based on Welch’s method proved to be most effective. Although welch32 and welch64 schemes led to slightly better results than welch16, considering both average and individual scores, all three of them achieved the mean value of accuracy over 60%. Feature extraction schemes based on other used methods failed to get close to that score.
Autoregressive modeling-based schemes ar16 and ar24 achieved classification accuracy at the level of 50%. Interestingly, using the ar24 scheme resulted in slightly lower classification accuracy than using ar16. This shows that increasing the number of features may not provide higher accuracy.
Surprisingly, poor results were achieved using wavelet-based feature extraction schemes. The dwt scheme proved to be the least effective one in this experiment. Slightly better results, though still weak, were achieved with the dwt_stat scheme. A possible explanation for the poor performance of the dwt scheme may be overly high dimensionality of feature vectors, as dimensionality is thought to be particularly problematic in k-NN classifiers [89,90].
In the case of autoregressive modeling-based and wavelet transform-based feature extraction schemes, the best results were achieved with k = 17, the highest of used values. Welch method-based schemes were more effective with k = 11 and k = 14. It must be noted that the impact of the value of k on classification accuracy turned out to be small in comparison to the impact of the feature extraction scheme.
To find out the statistical significance of results presented in Table 2, a series of statistical tests was conducted. The approach employed for this purpose is a mixed linear model (MLM) [91]. Statistical testing with the use of MLMs allows testing of observations that are statistically dependent. In the case of data from Table 2, we test the difference of means obtained by the k-NN classifier with different types of feature extraction schemes. The averaging process is conducted over a set of values obtained for different values of k. The use of MLMs also allows testing of vectors of dependent values that analyze vectors of unequal length. This feature is important in the context of experiments 2 and 3, which have tables of results with missing values. For the calculation of MLMs, an implementation of this method provided in the Python statsmodels package [92] was employed. Columns from Table 2 were treated as dependent vectors of observations, thus the test describes the difference of performance of the k-NN algorithm for each type of input data preprocessing, and this difference is observed on a set of varied k-NN algorithm k hyperparameter values. The results of the test procedure are shown in Table 3. The algorithm finds the influence of each algorithm on the mean value of accuracy shown in the Table 2. The reference, which also defines values observed for the Intercept row from the table, is the welch32 algorithm, which was found to provide the highest mean accuracy calculated as a mean of performance for all variants of the k-NN algorithm.
Results of the analysis shown in Table 3 lead to the conclusion that all Welch-based classifiers had similar performance, and there are no statistically significant differences between them. This conclusion may be driven from both the value of z statistic and the associated p-value and from the confidence interval values, which are negative for the left boundary and positive for the right boundary. The significance level was assumed to be equal to the standard value of 0.05. The influence of the rest algorithms is negative, and the worst performance is found in the case of the dwt-based parameterization method, which, even in the most positive case of a value retrieved from the right boundary of the confidence interval, is worse than the left boundary of all other algorithms. Therefore, it can be concluded that the best performing group of parameterization is the one based on the Welch method, and there were no significant differences between algorithms from this group.
Below, a detailed discussion on examples of feature extraction schemes and classifier scenarios is shown. In Table 4 (left), a normalized confusion matrix for the 11-NN classifier and the welch32 feature extraction scheme is shown. Observations belonging to the meditation class were mostly correctly classified, while observations belonging to the music video and logic game classes were often confused with each other. Such a result is somewhat expected, as both watching the music video and solving logic puzzles involve a certain level of mental stimulation and require focusing the subject’s attention. Meditation, as the activity most different from the others, proved to be the easiest one to classify correctly. Confusion matrices for 11-NN welch16 and 11-NN welch64 (not presented in the article) contain very similar values. On the right side of Table 4 (right), a normalized confusion matrix for the 17-NN classifier and ar16 feature extraction scheme is shown. Again, the frames belonging to the meditation class are mostly correctly classified. Observations belonging to the logic game class are sometimes assigned to two remaining classes. Observations belonging to the music video class are least often correctly classified ones—only 32% of the observations of this class were correctly recognized. As many as 43% of the music video observations were misclassified as meditation. The confusion matrix for 17-NN ar24 (not shown in the article) contains very similar values.
In Table 5, normalized confusion matrices for 17-NN dwt and 17-NN dwt_stat scenarios are shown. These matrices differ greatly. In the case of 17-NN dwt, most observations of all classes have been recognized as logic game, a much lesser part as music video, and the least part as meditation. In the 17-NN dwt_stat scenario, the meditation observations were mostly correctly classified, while logic game and music video were assigned in different proportions to all classes, however most often to the meditation class.
In Table 6, values of precision, recall, and F1 score for chosen scenarios are shown. Precision for a given class is defined as the ratio of the number of observations correctly assigned by a classifier to that class (true positives) to the number of all observations assigned by a classifier to that class (sum of true and false positives). Recall that a given class is defined as the ratio of true positives to the number of all observations belonging to that class (sum of true positives and false negatives). F1 score is defined in the following way.
F 1 = 2 · p r e c i s i o n · r e c a l l p r e c i s i o n + r e c a l l
In the case of data from Table 6, we also employed a series of statistical tests to find out the statistical significance of the obtained results. All confusion matrices used for calculation of precision, recall, and F1 score were also subject to the chi-square test, which is used to find if unevenness of value distribution in a given contingency table is uneven to a purely random chance or is it caused by some external factor. Confusion matrices in this context can be treated as a special case of contingency tables. For Table 6, only one result was found to be statistically insignificant and thus not recognized by the classification algorithm—a music video scenario in the case of the 17-NN dwt algorithm. The value of the test statistic was equal to 3.634, and thus the p-value is equal to 0.056. If the significance level of 0.05 is considered, the result of the classifier is equivalent to random assignment to the class, and the result is statistically insignificant. For the rest of the classifiers, the results are statistically significant. A Holm–Bonferroni correction for multiple testing was applied to the outcomes of the three consecutive tests conducted for each of the classes.
As earlier mentioned, results obtained with the use of the first validation scheme (training/validation/test or training/test) were compared to the outcomes of 10-fold cross-validation (2nd scheme). A confidence interval ( α = 0.95) was calculated for a vector of values provided by the cross-validation procedure. Differences between the scores of both validation schemes are considered statistically significant if this value was outside the confidence interval. Calculations are performed with the use of R language. For calculation of confidence intervals, a DescTools library was employed [93].
If, in the 1st validation scheme, the result is outside the confidence interval, then the difference between this result and the nearer boundary of the confidence interval is taken into account. In our further discussion, if a performance measure value from the study based on the 1st scheme is below the lower boundary of the confidence interval, we will report an increased performance in the case of the cross-validation and provide the difference of performances according to the following formula,
Δ M p = C I L M 3 s e t s ,
where Δ M p is a difference between measures which can be accuracy, precision, recall, or F1; C I L is the value of the lower boundary of confidence interval calculated for results from cross-validation based assessment; and M 3 s e t s is the value of measure based on assessment employing single random division into training, validation, and test sets.
The formula is applied only if C I L > M 3 s e t s . If the degradation of performance is observed, then another formula is employed for reporting the result:
Δ M p = M 3 s e t s C I U ,
where C I U denotes the upper boundary of confidence interval derived from outcomes of cross-validation based benchmark. This equation is applied only if M 3 s e t s > C I U .

Experiment 1

For comparison, 10-fold cross-validation was performed for the best (11-NN welch32) and the worst (17-NN dwt) training/test scheme in Experiment 1. The results are contained in Table 7. Comparing precision, recall, and F1 score metrics of the two test schemes, they are quite similar for 11-NN welch32. However, they differ for the logic game and music video for the 17-NN dwt. In the case of 10-fold cross-validation and 11-NN welch32, most of the observations belonging to the meditation class are classified correctly, while observations of remaining classes are assigned in nearly the same proportions to all classes. The statistical analysis performed for the comparison purpose between validation schemes is shown further on.
In the case of the worst scenario, when the DWT-based parametrization is considered, the accuracy value obtained in the 1st scheme was found to be within the confidence interval calculated from the results of cross-validation, thus there were no statistically significant differences between two approaches. Similarly, no such differences were found for the precision measure. However, for the recall measure, we found that the classifier performed significantly worse for the logic game class; the upper boundary of the confidence interval is 0.098 smaller than the result from the 1st scheme. However, the same classifier performed better in the case of the music video, and the improvement is very similar to the reduction of performance in the case of the logic game (i.e., 0.095). Obviously, a similar pattern can be observed for the F1 measure, which is derived from precision and recall. Performance for the logic game is statistically significantly worse (i.e., 0.018), and performance for music video increased by 0.048.
In the case of the best performing algorithm (based on the Welch method), we also did not find accuracy to be statistically different in both scenarios. Differences were observed for all remaining measures. For precision measure, an increase in performance was found for the logic game (i.e., 0.0158) and degradation for the music video class by 0.007. For the recall measure, performance degraded by 0.0445 for the logic game and increased by 0.0546 for the music video. For the F1 measure, performance for logic game degraded by 0.0052, and increased for meditation by 0.0028 and for music video by 0.0154.
Observed changes of performances were statistically significant, but it is worth mentioning that in some cases, the difference between values from the 1st scheme and the closest boundary derived from the cross-validation assessment is small (smaller than 0.01).

3.2. Experiment 2: Support Vector Machines with a Linear Kernel

In the second experiment, the accuracy of classification with support vector machines was tested. A linear function was used as a kernel. Used values of penalty parameters C were 0.01, 0.1, 1, 10, and 100. For some combinations of C parameter value and feature extraction scheme, experiments were not conducted because of very long computation times, and poor results achieved for the given scheme in conjunction with other values of C. Data used for training and testing were the same as in Experiment 1. The code for training and testing classifiers is contained in Appendix A.
The results are shown in Table 8. The best individual scores for the given feature extraction scheme and best mean score from all feature extraction schemes for given k value are highlighted in bold. The highest value of accuracy was achieved for the welch32 feature extraction scheme, combined with the value of C = 1. It amounted to 66.71%, which is almost three percentage points higher than the best result in Experiment 1. The best mean value of the classification accuracy was achieved for the welch64 feature extraction scheme, although the score obtained with welch32 was not much worse. The highest mean scores of all feature extraction schemes were acquired for C = 10 and C = 100. This is probably because experiments were not conducted for wavelet-based feature extraction schemes, which would otherwise lower the mean scores.
Both the best individual and mean scores turned out to be slightly better than the scores obtained in Experiment 1. Nevertheless, similar conclusions can be drawn from both experiments. Welch’s method again turned out to be the best parametrization method in terms of both individual best and mean scores. The dwt scheme again turned out to be the least effective one. The main difference in the results of both experiments is that in Experiment 2, applying the ar24 scheme resulted in higher accuracy scores than using the ar16 scheme. The most substantial improvement in results was obtained for the dwt_stat scheme.
Similarly to the first experiment, an MLM-based analysis was also applied for data from Table 8. Results from such analysis are presented in Table 9. This table contains the results calculated with the mixed linear model analysis. In this case, a welch64 algorithm was employed as a reference.
Again, similar to the outcomes of the first experiment, Welch method-based algorithms performed similarly, and there were no statistically significant differences in their performance. The rest of the algorithms performed worse than the reference algorithm. The worst performance is associated with the dwt algorithm.
Moreover, accuracies for the case of SVM (linear kernel) in 10-fold cross-validation were obtained for welch32 (the best performance) and dwt (the worst outcome) feature extraction variants. The results are shown in Table 10. Comparing these values with Table 8, one can observe that they are quite similar, though accuracy values are lower in the 10-fold cross-validation scheme. Again, the formal approach to statistical analysis will be shown at the end of this Section.
In Table 11 and Table 12, the normalized confusion matrices for welch32, ar16, dwt, and dwt_stat feature extraction schemes are shown. For the first three variants, confusion matrices are very similar to the ones obtained in Experiment 1. For the dwt_stat scheme, improvement in classification scores for observations belonging to logic game and music video in comparison to scores from Experiment 1 can be noted.
In Table 13, values of precision, recall, and F1 score for welch32, ar16, dwt, and dwt_stat schemes are presented. In all variants, with the exception of dwt values, all aforementioned measures are highest for the meditation class and lowest for the music video class.
Outcomes from Table 13 were also tested with the chi-square statistical test. Results from all variants were found to be statistically significant with the exception of dwt and C = 0.01. In this case, the statistic for meditation class was smaller than 0.001, which resulted in a p-value of 0.993; the logic game was associated with a test statistic of 0.629, which resulted in a p-value of 0.812; and the music video was associated with a test statistic value of 0.582 and a p-value of 0.812. Therefore, classification in this variant is equivalent to the class assignment done randomly, and outcomes are statistically insignificant.
In Table 14, precision, recall, and F1 score are shown for a 10-fold cross-validation scheme. Comparing these values with Table 13, one may observe that the metric values obtained for all classes look very similar; however, the statistical analysis shown below details whether the differences are statistically significant.
For the feature extraction method associated with the worst performance (based on DWT) we found that there were no statistically significant differences between the 1st scheme and the cross-validation based benchmarks. For the best performing scenario (based on welch32), we may observe that most of the results differ in a statistically significant way; however, some of the differences in performances are very small (smaller than 0.01). Overall accuracy was found to be lower for cross-validation (0.0012). Precision also provided smaller values for cross-validation (i.e., 0.0027). For recall, the performance also decreased for logic game and meditation (by 0.0206 and 0.0012, respectively). The performance for meditation increased by 0.0012. Degradation (0.0156) of the F1 score was observed for the logic game, and an increase of 0.003 in the F1 value was found for the meditation class.

3.3. Experiment 3: Support Vector Machines with Radial Basis Function Kernel

Experiment 2 was repeated using a radial basis function (RBF) as a kernel. The utilized values of RBF parameter γ were as follows, 0.1, 1, and 10. The code for training classifiers and test data classification is shown in Appendix A. A summary of results is presented in Table 15. The best individual scores and the best mean score for the given feature extraction scheme, C and γ are highlighted in bold.
The highest individual classification accuracy was again achieved for the welch32 scheme with parameters C = 10 and γ = 10. It amounted to 69.33%, a result that is over 2.5 percentage points better comparing to the linear SVM best score and 5.5 percentage points better than the k-NN best score. As in previous experiments, the scores obtained using the Welch method, in particular, in the welch32 and welch64 variants, turned out to be much higher than with the other methods. On the other hand, the mean accuracy scores of all C and γ values are not much higher for welch32 and welch64 variants in comparison to values obtained in Experiment 2. Moreover, for autoregressive modeling-based schemes and the dwt_stat variant, mean accuracy scores turned out to be much lower in comparison to scores obtained in previous experiments. This is due to the greater influence of C and γ on accuracy scores. In previously tested classifiers, changing the values of k and C parameters had a small impact on the accuracy of the classification. In the present experiment, the classification accuracy for the ar16 scheme with the parameters C = 1 and γ = 0.1 was 52.86%. After changing the value of γ to 10, classification accuracy amounted to only 33.33%. The difference is, therefore, almost 20 percentage points. As seen from Table 15, most of the used combinations of C and γ values resulted in relatively low classification accuracy compared to the maximum values, both in this experiment and in the previous ones, for a given feature extraction scheme. This explains the low average values of classification accuracy and indicates the need for fine-tuning of the SVM classifier parameters when using the RBF kernel.
In the performed experiment, the feature extraction scheme resulting in the poorest results turned out to be the dwt_stat scheme. The highest classification accuracy for this scheme decreased by 12 percentage points compared to the linear SVM classifier, and by six percentage points compared to the k-NN classifier. Using the radial basis function kernel resulted in a very different shape from the hyperplane decision boundary of the linear SVM classifier. The decision boundary of the k-NN classifier at high k values may converge to the hyperplane, which explains the similarity of the results for the k-NN and the SVM linear classifier. A different shape of achievable decision boundaries may result in better classification results in some data sets, but worse in others.
Results of the statistical MLM-based analysis of outcomes of the third experiment are presented in Table 16. In this table, the results of the mixed linear model analysis for data from Table 15 are contained. In this case, also the welch64 algorithm was treated as a reference.
Similar to the previous two experiments, no significant differences were observed for welch-based algorithms and the worst performance was found in the case of the dwt algorithm. However, the difference in performance between dwt and other algorithms such as ar16 and ar24 is not as prominent as in the case of previous experiments. In their case, pessimistic performance is similar to the pessimistic performance of the dwt algorithm.
10-fold cross-validation was also performed for SVM with a radial kernel function for two feature schemes based on the accuracy results (the lowest and the highest accuracies) contained in Table 17. Therefore, dwt_stat (C = 0.01, γ = 0.1) and welch32 (C = 10, γ = 10) cases were examined. Comparing Table 15 and Table 17, one can see that the results look very similar, though accuracy for the worst-performing algorithm (based on dwt stat) degraded by 0.0703.
In Table 18 and Table 19, the normalized confusion matrices for feature extraction schemes welch32, ar16, dwt, and dwt_stat with the best parameter combinations are shown. The confusion matrix for the welch32 scheme is very similar to the confusion matrix obtained for that scheme in previous experiments. The majority of meditation frames are correctly classified, while the other two categories are sometimes confused with each other.
The error matrix for the ar16 variant has similar values, as in the previous experiment. Classification accuracy for logic game and music video frames increased, while the accuracy for the meditation class decreased.
The confusion matrix for the dwt scheme, in turn, differs greatly from the matrices obtained in previous experiments, in which most of the observations were classified into the logic game or music video classes and very few observations into the meditation category. In the present experiment, most of the observations belonging to the meditation class are classified correctly, while observations of remaining classes are assigned in different proportions to all classes, but most often to the class meditation.
In the case of the dwt_stat scheme, observations belonging to the logic game and music video classes are assigned to three classes roughly equally. Observations of the meditation class are in half of the cases mistakenly assigned to other classes.
In Table 20, values of precision, recall, and F1 score for each signal class for the welch32, ar16, dwt, and dwt_stat feature extraction schemes are presented. For welch32 and ar16, the values of all measures are the highest for the meditation class and the lowest for the music video class. Note the relatively low precision for the meditation class and the dwt scheme.
After the statistical testing process with chi-square test, all differences presented in Table 20 were found to be statistically significant.
In Table 21, values of precision, recall, and F1 score are shown for 10-fold cross-validation for the best and worst results of the training/validation/test scheme, as shown in Table 20. For welch32, resulting metrics are very similar. For dwt_stat feature extraction scheme, values of all measures are lower. Again, the statistical analysis was performed showing which differences are statistically significant.
Degradation was found for all classes in terms of precision. The logic game deteriorated by 0.1526, music video by 0.1661, and music video by 0.2089. No differences were found for recall measure. In terms of the F1 measure, degradation was observed for meditation class (i.e., 0.0838) and music video (i.e., 0.1114).
Only two statistically significant differences were found for the best performing algorithm (based on Welch’s method). Both are associated with the recall measures. For the logic game, performance dropped by 0.0071, and there was an increase of 0.0033 for meditation. It is worth noting that these are low values compared to the magnitude of performance changes in other algorithms.

3.4. Experiment 4—Neural Networks

In the last experiment performed, the accuracy of classification using neural networks was examined. Neural networks belonging to deep learning classifiers class, with a single hidden layer with the ReLU activation function [94,95] and the softmax activation function in the output layer, were used. Weights were initialized with the He method (parameter kernel initializer = ’he uniform’) [96] was used, while biases were initialized with zeros. The Nesterov gradient method was used for training the network [97]. The learning rate parameter was set to 0.01 with a decay of 10−6 per epoch. Momentum was set to α = 0.9. To prevent overfitting, early stopping with the patience of 50 epochs was used. This parameter refers to the number of epochs to wait before early stop if no progress on the validation set is achieved. The maximum possible number of learning epochs was set to 2000. The results are presented in Table 22. The code used for training the networks is shown below.
For the autoregressive modeling-based and wavelet transform-based methods, the results obtained were similar to the results obtained with linear SVMs, while for Welch’s method, the obtained accuracy was even higher than in previous experiments. Again, the welch32 scheme, for which classification accuracy higher than 70% was achieved for the first time, turned out to be the best option. The code employed to achieve these outcomes is provided in Appendix A.
Values from Table 22 were also subject to statistical testing. To test it, the ANOVA analysis could be employed; however, first, a Levene test for uniformity of variance should be performed. The value of test statistic was equal to 0.883, and thus p-value was equal to 0.512. Therefore, all variances of observation vectors gathered for each algorithm can be assumed to be equal. Next, a series of Shapiro–Wilk tests were conducted to test the second assumption of the ANOVA test, which is Gaussian distribution of observation. For all but one algorithm p-value of the Wilk–Shapiro test was in a range between 0.157 and 0.629. However, for the dwt algorithm, the value of the Shapiro–Wilk algorithm was equal to 0.016, and therefore, it is concluded that one of the observations does not have Gaussian distribution, and the ANOVA test cannot be performed. p-values of the Shapiro–Wilk test were corrected for multiple testing with a Holm–Bonferroni correction. Instead of ANOVA, the Kruskal–Wallis nonparametric alternative for ANOVA has to be conducted. The statistic of the Kruskal–Wallis test is in this case equal to 67.454, and thus the p-value is smaller than 0.001, and differences between medians of results obtained by each algorithm are statistically significant in the case of at least one pair of algorithms. To find out such pairs, the Dunn post-hoc test is conducted. The matrix of p-values of the Dunn test is presented in Table 23.
Ar16 and ar24 performed similarly, and no statistical difference was found between the performance of those two algorithms. The behavior of the group of welch-based algorithms was close to ar-based schemes; however, a statistically significant difference was found between welch16 and welch32 algorithms. Although dwt and dwt_stat algorithms performed in a similar manner, no statistically significant differences in performance were found in their case.
Due to the fact that in each of the conducted experiments, the welch32 scheme provided the best results, further part of the experiments focused on tuning the neural network to obtain the best possible outcome with this scheme.
The general specification of the neural networks for which the best results were obtained is presented in Table 24. All described networks have output layers consisting of three neurons with the softmax activation function. In all networks, weights were initialized with the He method, while biases were initialized with zeros. The Nesterov gradient method was used to train the networks. The best result of all performed experiments is marked in bold. While using the 10-fold cross-validation scheme, the accuracy for the best neural network configuration resulted in a value of 0.7412. Thus, the outcome is very similar in both testing/validation schemes.
In Table 25, the normalized error matrix for the best neural network is presented. It can be seen that better classification results compared to the SVM (RBF) classifier are due to the higher sensitivity for the music video category. Sensitivity for the other classes remained at a similar level. The left side of Table 25 shows results for the training/validation/test scheme, whether the outcomes of 10-fold cross-validation are contained on the right side. As seen in Table 25, the above conclusions are valid for both testing schemes.
In Table 26, values of precision, sensitivity, and F1 score for each class for the best neural network configuration are shown. Similarly, as in the previous experiments, scores are the highest for the meditation class and the lowest for the music video class. Noteworthy is a considerable increase in the value of measures, primarily sensitivity, for the music video class, 13 percentage points compared to SVM-RBF, and 21 percentage points compared to k-NN. All values from Table 26 were found to be statistically significant after conducting the chi-square test.
Similarly, for the same NN configuration and the welch32 feature extraction scheme, 10-fold cross-validation was performed, and the resulted metrics are shown in Table 27.
The accuracy increased in the cross-validation-based study. The difference between original performance from the 1st scheme assessment and the lower boundary of the confidence interval for cross-validation based study is 0.0318. For the performance, drops were observed for the logic game, it was dropped by 0.027, and for the music video by 0.0014. For recall, an increase of 0.011 was observed for meditation, and the fall for the music video (i.e., 0.014). For the F1 measure, performance for meditation increased by 0.0041 and dropped by 0.0068 for music video.

Summary

Translation of performances from evaluation based on three subsets to assessment based on cross-validation differed in the case of all seven algorithms. Some of the changes were statistically significant, but the difference between the boundary of confidence interval and value of measure calculated based on the 1st scheme differed by a very modest amount (smaller than 0.01). Some changes were very pronounced, an example is the dwt stat-based scenario from Experiment 3. There were feature/classification algorithm scenarios which performed identically in terms of the proposed analysis, and an example of such is the one based on dwt from Experiment 2. This can be a vital indication related to how each feature extraction/algorithm can generalize while tested on data from other datasets and how reliable and reproducible these effects are.
It should be noted that applying the techniques listed below did not improve or even worsened the classification accuracy:
  • adding more hidden layers,
  • using parametric ReLU activation function,
  • using adaptive optimization methods like Adam,
  • adding batch normalization or dropout layers,
  • adding L1 or L2 weight decay,
  • adding additional features: skewness, kurtosis, and energy computed for every channel from raw, unprocessed frames
In Figure 2 and Figure 3, the first two principal components of the training and test data sets parameterized with the welch32 scheme are plotted. The first two principal components, in this case, are responsible for 12.87% and 4.33% of the training dataset variance, respectively. It is possible to draw the decision boundary in such a way that most observations belonging to the classes meditation and logic game are correctly classified. Observations belonging to the music video class are problematic because they mix with observations of the other classes, in particular with the observations of the logic game class. In order to further improve the accuracy of classification, the critical issue is finding features that will enable separating observation of the music video class from the observation of the other two classes.

4. Conclusions

The aim of this study was to compare the effectiveness of selected methods of signal analysis and classification methods in the task of recognizing three mental states: meditation, logic game, and a video clip, based on a recorded EEG signal. The data have been preprocessed by employing independent component analysis. For parametrization of the signal, autoregressive modeling, Welch method, and discrete waveform transformation were used. Feature vectors were reduced by the principal components analysis. The classification was performed employing nearest neighbors, support vector machines, and neural networks (with three hidden layers and the LeakyReLU activation function).
Among the tested methods of signal analysis in the carried out investigation, the best results were achieved with Welch’s method, while the neural network turned out to be the most effective classifier. The choice of parameterization method turned out to have a much greater influence on the final accuracy of classification than the choice of a classifier. The same trend in metrics was also obtained while utilizing the 10-fold cross-validation scheme. We can see that for overall accuracy satisfactory results appear in our study for the meditation phase as they reach 90% in the accuracy score. This means that several limitations in our approach should be overcome; some of them are listed below.
In the conducted experiments, autoregressive model coefficients were used as features. Another possible approach is to calculate an estimate of the spectral power density from the obtained autoregressive model. Other factors that have not been studied are the effect of the ICA algorithm on classification results used [18,19,20,21,22,23,24,25,26], the effect of the initial removal of the constant component and whitening of data frames, the effect of a long data frame (also in the context of the compromise between the frame length and the number of training observations), the effect of the tab length, and in the case of the discrete waveform, the effect of the waveform. It should also be noted that another dataset should be tested as a benchmark to avoid the problem that the results obtained are due to a combination of specific features or classification techniques [98,99,100,101]. As recalled in the introductory section, there exists a variety of datasets available to the public, thus they may be utilized for this purpose, however when having similar dataset features and formats. Testing the influence of all these factors is to be a further direction of our research.
The main factor limiting the accuracy of classification was the difficulty of separating the video class observations from those of other classes. Therefore, the need to develop a set of features allowing for better separation of classes should be researched. There is also a possibility of introducing an additional meditation phase between the music video phase and the logic game. This would probably allow for a better signal separation of these two active phases, and in consequence, in a more effective classification. Moreover, analyzing all the results, one may suppose that playing the logic game and watching the music video clip result in similar brain activity. If this is a case, two classes could be discerned, i.e., meditation/activity only. This is one of the future directions of this study.
Moreover, to determine differences, another type of BCI casque may be utilized containing more measuring electrodes and better preprocessing [44]. Then, the problem of eventually overlapping brain signals in these two activities may be easier to resolve.
Besides, it was found that EEG signals respond differently to different types of music [102]. Thus, it will be interesting to pursue this direction. This effect may also be person- and mood-dependent. That is why a questionnaire form may be prepared to ask what are subjects’ music preferences and in what mood they are when taking part in the tests.
However, when approaching the problem of the limitations of the EEG signal analysis, and building an effective BCI interface in general, one may refer to several additional experimental issues. Zhang referred to overfitting in electroencephalogram (EEG) classification as one of the essential limitations in using EEG as brain–computer interfaces (BCIs) [35]. This may require various regularization schemes, data augmentation, or using dropouts in the NN model. Moreover, the effectiveness of the classification process depends to a large extent on the amount and quality of the prepared data (including both selection of characteristics and redundancy), thus a variety of methods might be checked with different settings. Classification outcomes determine the best configuration of the feature scheme/classification algorithm. However, for the EEG signal analysis, 2D spectral representations may be used to augment data for the deep learning classification. Another way of data augmentation is to utilize examples from similar but not identical datasets. This may allow obtaining better generalization due to exposing the network to more training examples. It may be realized based on unsupervised pre-training or transfer learning. As pointed out by Han et al. [103], it is often reasonable to assume that the input–output mapping is similar across different models, so a better NN performance may be obtained by fitting all the parameters at the same time. Lastly, since poor generalization ability still limits the broader use of BCI, thus deep learning could be employed in the form of, e.g., autoencoders without manual feature selection [32,35].

Supplementary Materials

Python code prepared for the experiments is available at https://multimed.org/research/sensors2020.zip.

Author Contributions

Conceptualization, J.B., A.K., and B.K.; Data curation, A.K.; Formal analysis, A.K. and B.K.; Investigation, B.K.; Methodology, J.B., A.K., and B.K.; Validation, A.K.; Visualization, J.B.; Writing—original draft, J.B.; Writing—review & editing, B.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

We thank the anonymous reviewers whose comments helped to improve and clarify our manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A contains simplified snippets of code utilized in experiments.
The FastICA algorithm is shown below:
  • # independent component analysis with whitening
  • # performed for single frame
  • from sklearn.decomposition import FastICA
  • ica = FastICA(n_components=14, whiten=True)
  • new_dataframe = ica.fit_transform(dataframe)
The following code was employed concerning calculations of the autoregressive models. Only the real values of computed model coefficients were utilized. After concatenating model coefficients from all channels with previously computed mean values and variances final feature vectors of 252 elements were obtained.
  • # computing autoregressive models for single frame
  • import spectrum
  • new_dataframe = []
  • for channel in dataframe:
    • model = spectrum.arburg(channel,order=16,criteria=None)[0]
    • model = [item.real for item in model]
    • new_dataframe.append(model)
In the case of the welch16 parametrization method, samples from each channel in every frame were divided into eight non-overlapping subframes of 16 samples each. Subsequently, nine coefficients were obtained per every channel. Final feature vectors (with pre-computed mean values and variances) contained 154 elements:
  • # computing power spectral density with Welch’s method for single frame
  • import scipy.signal
  • psd = scipy.signal.welch(dataframe,nperseg=16,axis=0)[1]
Decompositon into wavelets was performed using the 4th level discrete wavelet transform (wavedec function from pywt library). After concatenating coefficient vectors with pre-computed mean values and variances, final feature vectors of 2184 elements were obtained. The transcription of this algorithm is provided in a listing below:
  • # computing discrete wavelet transform
  • # for single frame
  • import pywt, numpy as np
  • new_dataframe = []
  • for channel in dataframe:
  •    dwt = pywt.wavedec(channel,wavelet=“db4”,level=4,axis=0)
  •    vector = np.ndarray((0,))
  •    for item in dwt:
  •       vector = np.append(vector,item)
  •       new_dataframe.append(vector)
In the case of dwt_stat parametrization method, for each of five wavelet coefficient-based vectors mean value, mean value of absolute values, variance, skewness, kurtosis, zero-crossing rate, and the sum of squares were computed:
  • # computing descriptive parameters from wavelet coefficients
  • # for single frame
  • import numpy as np, scipy.stats, pywt
  • def zero_crossings(data):
  •     return ((data[:−1] ∗ data[1:]) < 0).sum()
  • dwt    = pywt.wavedec(channel,wavelet=“db4”,level=4,axis=0)
  • vector   = []
  • for item in dwt:
    • vector.append(np.mean(item))
    • vector.append(np.mean(np.abs(item)))
    • vector.append(np.var(item))
    • vector.append(scipy.stats.skew(item))
    • vector.append(scipy.stats.kurtosis(item))
    • vector.append(zero_crossings(item))
    • vector.append(sum(np.power(item,2)))
Dimensionalities of feature vectors obtained with the aforementioned schemes were reduced via principal component analysis (PCA), this was performed in Python as follows:
  • # dimensionality reduction via principal component analysis
  • from sklearn.decomposition import PCA
  • pca        = PCA(n_components=0.95)
  • pca.fit(train_data)
  • reduced_train_data = pca.transform(train_data)
  • reduced_val_data = pca.transform(val_data)
  • reduced_test_data   = pca.transform(test_data)
Validation tests were performed as follows:
  • # dividing dataset into training, validation and test data
  • import random
    • data_train = {}, data_val = {}, data_test = {}
    • a     = int(0.7 ∗ 8265)
    • b     = a + int(0.1 ∗ 8265)
  • for class_name in eeg.keys():
  • # eeg.keys() are [‘meditation’,‘music_video’,‘logic_game’]
    • indices = [i for i in range(8265)]
    • random.shuffle(indices)
    • data_train[class_name] = [eeg[class_name][i] for i in indices[:a]] data_val[class_name] = [eeg[class_name][i] for i in indices[a:b]]
    • data_test[class_name] = [eeg[class_name][i] for i in indices[b:]]
In the case of training neural networks, it is required for the inputs and outputs of the network to be encoded as vectors of numbers. An example of one-hot encoding is shown below:
  • # encoding classes via one-hot encoding
  • classes = {}
  • classes[‘meditation’] = np.array([1,0,0])
  • classes[‘music_video’] = np.array([0,1,0])
  • classes[‘logic_game’] = np.array([0,0,1])
  • for i in range(len(train_data_classes)):
  •   train_data_classes[i] = classes[train_data_classes[i]]
  • for i in range(len(test_data_classes)):
  •   test_data_classes[i] = classes[test_data_classes[i]]
  • for i in range(len(val_data_classes)):
  •   val_data_classes[i] = classes[val_data_classes[i]]
Code for training k-NN classifiers, test data classification, and computing score measures in Experiment 1 is shown below:
  • # training k-NN classifiers, test data classification
  • # computing accuracy, confusion matrix, precision, recall and F1
  • from sklearn.neighbors import KNeighborsClassifier
  • from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
  • knn      = KNeighborsClassifier(n_neighbors=k)
  • knn.fit(reduced_train_data,train_data_classes)
  • results    = knn.predict(reduced_test_data)
  • score      = accuracy_score(test_data_classes,results)
  • conf_matrix   = confusion_matrix(test_data_classes,results)
  • report    = classification_report(test_data_classes,results)
The code for training and testing SVM classifiers (Experiment 2, linear kernel function) is shown below:
  • # training SVM classifiers with linear kernel
  • # and test data classification from sklearn.svm import SVC
  • svm   = SVC(C=C,kernel=‘linear’)
  • svm.fit(reduced_train_data,train_data_classes)
  • results = svm.predict(reduced_test_data)
The code for training and testing SVM classifiers (Experiment 3, radial kernel function) is shown below:
  • # training SVM classifier with radial basis function kernel
  • # and test data classification from sklearn.svm import SVC
  • svm  = SVC(C=C,kernel=‘rbf’,gamma=gamma)
  • svm.fit(reduced_train_data,train_data_classes)
  • results = svm.predict(reduced_test_data)
The code employed for for training and testing neural networks (Experiment 4) is provided below:
  • # traning neural networks with single hidden layer
  • # and training data classification
  • from keras.layers import Dense, Activation from keras.models import Sequential
  • from keras.optimizers import SGD
  • from keras.callbacks import EarlyStopping n_inputs = reduced_train_data.shape[1] model = Sequential()
  • model.add(Dense(n_inputs,activation=‘relu’,input_dim=n_inputs,
  • kernel_initializer=‘he_uniform’,bias_initializer=‘zeros’))
  •   model.add(Dense(3,activation=‘softmax’,
  • kernel_initializer=‘he_uniform’,bias_initializer=‘zeros’))
  •   sgd = SGD(lr=0.01,decay=1e-6,momentum=0.9,nesterov=True)
  • model.compile(loss=‘categorical_crossentropy’, optimizer=sgd,metrics=[‘accuracy’])
  • es = EarlyStopping(monitor=‘val_loss’,mode=‘min’,patience=50,restore_best_weights=True)
  • history = model.fit(reduced_train_data,train_data_classes,batch_size=64,
  •   validation_data=(reduced_val_data,val_data_classes), callbacks=[es],epochs=2000)
  • predictions = model.predict(reduced_test_data,batch_size=128)
  • score   = model.evaluate(reduced_test_data, test_data_classes,batch_size=128)

References

  1. Jiang, X.; Bian, G.B.; Tian, Z. Removal of Artifacts from EEG Signals: A Review. Sensors 2019, 19, 987. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Cabañero-Gómez, L.; Hervas, R.; Bravo, J.; Rodriguez-Benitez, L. Computational EEG Analysis Techniques When Playing Video Games: A Systematic Review. Proceedings 2018, 2, 483. [Google Scholar] [CrossRef] [Green Version]
  3. Machado, S.; Araujo, F.; Paes, F.; Velasques, B.; Cunha, M.; Budde, H.; Basile, L.F.; Anghinah, R.; Arias-Carrión, O.; Cagy, M.; et al. EEG-based Brain–computer Interfaces: An Overview of Basic Concepts and Clinical Applications in Neurorehabilitation. Rev. Neurosci. 2010, 21, 451–468. [Google Scholar] [CrossRef] [PubMed]
  4. Kaplan, P.; Sutter, R. Electroencephalographic patterns in coma: When things slow down. Epileptologie 2012, 29, 201–209. [Google Scholar]
  5. Kübler, A. Brain–computer interfacing: Science fiction has come true. Brain 2013, 136, 2001–2004. [Google Scholar] [CrossRef] [Green Version]
  6. Choubey, H.; Pandey, A. A new feature extraction and classification mechanisms for EEG signal processing. Multidim. Syst. Sign. Process. 2018, 30. [Google Scholar] [CrossRef]
  7. Gao, Z.; Wang, X.; Yang, Y.; Mu, C.; Cai, Q.; Dang, W.; Zuo, S. EEG-based spatio-temporal convolutional neural network for driver fatigue evaluation. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2755–2763. [Google Scholar] [CrossRef]
  8. Acharya, U.R.; Faust, O.; Kannathal, N.; Chua, T.J.; Laxminarayan, S. Dynamical analysis of EEG signals at various sleep stages. Comput. Methods Programs Biomed. 2005, 80, 37–45. [Google Scholar] [CrossRef]
  9. Kannathal, N.; Acharya, U.R.; Fadilah, A.; Tibelong, T.; Sadasivan, P.K. Nonlinear analysis of EEG signals at different mental states. Biomed. Eng. Online 2004, 3. [Google Scholar] [CrossRef] [Green Version]
  10. Nicolas-Alonso, L.F.; Gomez-Gil, J. Brain computer interfaces, a review. Sensors 2012, 12, 1211–1279. [Google Scholar] [CrossRef]
  11. He, B.; Gao, S.; Yuan, H.; Wolpaw, J.R. Brain–computer interfaces. In Neural Engineering; He, B., Ed.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 87–151. [Google Scholar] [CrossRef]
  12. Yuan, H.; He, B. Brain–computer interfaces using sensorimotor rhythms: Current state and future perspectives. IEEE Trans. Biomed. Eng. 2014, 61, 1425–1435. [Google Scholar] [CrossRef] [Green Version]
  13. Han, J.; Zhao, Y.; Sun, H.; Chen, J.; Ke, A.; Xu, G.; Zhang, H.; Zhou, J.; Wang, C. A Fast, Open EEG Classification Framework Based on Feature Compression and Channel Ranking. Front. Neurosci. 2018, 12. [Google Scholar] [CrossRef] [Green Version]
  14. Charles, W.A.; James, N.K.; O’Connor, T.; Michael, J.K.; Artem, S. Geometric subspace methods and time-delay embedding for EEG artifact removal and classification. IEEE Trans. Neural Syst. Rehabil. Eng. 2006, 14, 142–146. [Google Scholar] [CrossRef]
  15. Mannan, M.M.N.; Kamran, M.A.; Kang, S.; Jeong, M.Y. Effect of EOG Signal Filtering on the Removal of Ocular Artifacts and EEG-Based Brain–computer Interface: A Comprehensive Study. Complexity 2018, 18–36. [Google Scholar] [CrossRef]
  16. Subasi, A. EEG signal classification using wavelet feature extraction and a mixture of expert model. Expert Syst. Appl. 2007, 32, 1084–1093. [Google Scholar] [CrossRef]
  17. Calderon, H.; Sahonero-Alvarez, G. A Comparison of SOBI, FastICA, JADE and Infomax Algorithms. In Proceedings of the 8th International Multi-Conference on Complexity, Informatics and Cybernetics (IMCIC 2017), Orlando, FL, USA, 21–24 March 2017. [Google Scholar]
  18. Himberg, J.; Hyvärinen, A. Icasso: Software for investigating the reliability of ICA estimates by clustering and visualization. In Proceedings of the IEEE 13th Workshop on Neural Networks for Signal Processing (NNSP’03), Toulouse, France, 17–19 September 2003; pp. 259–268. [Google Scholar]
  19. Hyvärinen, A.; Karhunen, J.; Oja, E. Independent Component Analysis; Wiley: Hoboken, NJ, USA, 2001. [Google Scholar]
  20. Parsopoulos, K.E.; Varhatis, M.N. Recent approaches to global optimization problems through particle swarm optimization. Nat. Comput. 2002, 116, 235–306. [Google Scholar] [CrossRef]
  21. Vigário, R.; Särelä, J.; Jousmäki, V.; Hämäläinen, M.; Oja, E. Independent component approach to the analysis of EEG and MEG recordings. IEEE Trans. Biomed. Eng. 2000, 47, 589–593. [Google Scholar] [CrossRef] [Green Version]
  22. James, C.J.; Gibson, O.J. Temporally constrained ICA: An application to artifact rejection in electromagnetic brain signal analysis. IEEE Trans Biomed. Eng. 2003, 50, 1108–1116. [Google Scholar] [CrossRef]
  23. Langlois, D.; Chartier, S.; Gosselin, D. An Introduction to Independent Component Analysis: InfoMax and FastICA algorithms. Tutor. Quant. Methods Psychol. 2010, 6, 31–38. [Google Scholar] [CrossRef] [Green Version]
  24. Palmer, J.A.; Kreutz-Delgado, K.; Makeig, S. AMICA: An Adaptive Mixture of Independent Component Analyzers with Shared Components; University of California: San Diego, CA, USA, 2011. [Google Scholar]
  25. Iriarte, J.; Urrestarazu, E.; Valencia, M.; Alegre, M.; Malanda, A.; Viteri, C.; Artieda, J. Independent component analysis as a tool to eliminate artifacts in EEG: A quantitative study. J. Clin. Neurophysiol. 2003, 20, 249–257. [Google Scholar] [CrossRef] [Green Version]
  26. Delorme, A.; Makeig, S. EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 2004, 134, 9–21. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Chang, K.-M.; Lo, P.F. Meditation EEG interpretation based on novel fuzzy-merging strategies and wavelet features. Biomed. Eng. Appl. Basis Commun. 2005, 17, 167–175. [Google Scholar] [CrossRef] [Green Version]
  28. Jahankhani, P.; Kodogiannis, V.; Revett, K. EEG Signal Classification Using Wavelet Feature Extraction and Neural Networks. In Proceedings of the IEEE John Vincent Atanasoff 2006 International Symposium on Modern Computing (JVA’06), Sofia, Bulgaria, 3–6 October 2006; pp. 120–124. [Google Scholar]
  29. Suk, H.; Lee, S. A Novel Bayesian Framework for Discriminative Feature Extraction in Brain–computer Interfaces. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 286–299. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  30. Lotte, F. A Tutorial on EEG Signal Processing Techniques for Mental State Recognition in Brain–computer Interfaces. In Guide to Brain–computer Music Interfacing; Miranda, E.R., Castet, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
  31. Wen, T.; Zhang, Z. Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning of EEG Signals. IEEE Access 2018, 6, 25399–25410. [Google Scholar] [CrossRef]
  32. Zhang, X.; Yao, L.; Yuan, F. Adversarial Variational Embedding for Robust Semi-supervised Learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AL, USA, 4–8 August 2019; pp. 139–147. [Google Scholar] [CrossRef] [Green Version]
  33. Wu, Q.; Zhang, Y.; Liu, J.; Sun, J.; Cichocki, A.; Gao, F. Regularized Group Sparse Discriminant Analysis for P300-Based Brain–Computer Interface. Int. J. Neural Syst. 2019, 29, 6. [Google Scholar] [CrossRef]
  34. Edla, D.R.; Mangalorekar, K.; Dhavalikar, G.; Dodia, S. Classification of EEG data for human mental state analysis using Random Forest Classifier. Procedia Comput. Sci. 2018, 132, 1523–1532. [Google Scholar] [CrossRef]
  35. Zhang, X.; Yao, L.; Wang, X.; Monaghan, J.; McAlpine, D.; Zhang, Y. A Survey on Deep Learning based Brain–computer Interface: Recent Advances and New Frontiers. arXiv 2019, arXiv:1905.04149. [Google Scholar]
  36. Kurowski, A.; Mrozik, K.; Kostek, B.; Czyżewski, A. Comparison of the effectiveness of automatic EEG signal class separation algorithms. J. Intel. Fuzzy Sys. 2019, 10, 1–7. [Google Scholar] [CrossRef]
  37. Bashivan, P.; Rish, I.; Yeasin, M.; Codella, N. Learning Representations from EEG with Deep Recurrent-Convolutional Neural Networks. arXiv 2015, arXiv:1511.06448v3. [Google Scholar]
  38. Gonfalonieri, A. Deep Learning Algorithms and Brain–Computer Interfaces. Available online: https://towardsdatascience.com/deep-learning-algorithms-and-brain–computer-interfaces-7608d0a6f01 (accessed on 10 April 2020).
  39. Kurowski, A.; Mrozik, K.; Kostek, B.; Czyżewski, A. Method for Clustering of Brain Activity Data Derived from EEG Signals. Fundam. Inform. 2019, 168, 249–268. [Google Scholar] [CrossRef]
  40. Schirrmeister, R.T.; Springenberg, J.T.; Fiederer, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Hutter, F.; Burgard, W.; Ball, T. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 2017, 38, 5391–5420. [Google Scholar] [CrossRef] [Green Version]
  41. Kotsiantis, S.B.; Zaharakis, I.D.; Pintelas, P.E. Machine learning: A review of classification and combining techniques. Artif. Intell. Rev. 2006, 26, 159–190. [Google Scholar] [CrossRef]
  42. Salzberg, S.L. On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Min. Knowl. Discov. 1997, 1, 317–328. [Google Scholar] [CrossRef]
  43. Zheng., W.; Lu, B.L. Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks. IEEE Trans. Auton. Ment. Dev. 2015, 7, 162–175. [Google Scholar] [CrossRef]
  44. Gu, X.; Cao, Z.; Jolfaei, A.; Xu, P.; Wu, D.; Jung, T.-P.; Lin, C.-T. Fellow, IEEE, EEG-based Brain–computer Interfaces (BCIs): A Survey of Recent Studies on Signal Sensing Technologies and Computational Intelligence Approaches and Their Applications. arXiv 2020, arXiv:2001.11337. [Google Scholar]
  45. Subha, D.P.; Joseph, P.K.; Acharya, U.R.; Min, L.C. EEG Signal Analysis: A Survey. J. Med. Syst. 2010, 34, 195–212. [Google Scholar] [CrossRef] [PubMed]
  46. Zhang, Y.; Zhou, G.; Jin, J.; Zhao, Q.; Wang, X.; Cichocki, A. Sparse Bayesian Classification of EEG for Brain–Computer Interface. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 2256–2267. [Google Scholar] [CrossRef] [PubMed]
  47. Jebelli, H.; Khalili, M.M.; Lee, S. Mobile EEG-based workers stress recognition by applying deep neural network. In Advances in Informatics and Computing in Civil and Construction Engineering; Springer: Berlin/Heidelberg, Germany, 2019; pp. 173–180. [Google Scholar]
  48. Moon, S.-E.; Jang, S.; Lee, J.-S. Convolutional neural network approach for EEG-based emotion recognition using brain connectivity and its spatial information. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 2556–2560. [Google Scholar]
  49. DEAP Dataset. Available online: https://www.eecs.qmul.ac.uk/mmv/datasets/deap/ (accessed on 11 April 2020).
  50. Song, T.; Zheng, W.; Song, P.; Cui, Z. EEG emotion recognition using dynamical graph convolutional neural networks. IEEE Trans. Affect. Comput. 2019, 1. [Google Scholar] [CrossRef] [Green Version]
  51. SEED Dataset. BCMI Resources. Available online: http://bcmi.sjtu.edu.cn/resource.html (accessed on 11 April 2020).
  52. Attia, M.; Hettiarachchi, I.; Hossny, M.; Nahavandi, S. A time domain classification of steady-state visual evoked potentials using deep recurrent-convolutional neural networks. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; pp. 766–769. [Google Scholar]
  53. Katsigiannis, S.; Ramzan, N. DREAMER: A Database for Emotion Recognition Through EEG and ECG Signals from Wireless Low-cost Off-the-Shelf Devices. IEEE J. Biomed. Heal. Inform. 2018, 22, 98–107. [Google Scholar] [CrossRef] [Green Version]
  54. Mousavi, Z.; Rezaii, T.Y.; Sheykhivand, S.; Farzamnia, A.; Razavi, S. Deep convolutional neural network for classification of sleep stages from single-channel EEG signals. J. Neurosci. Methods 2019, 324, 108312. [Google Scholar] [CrossRef]
  55. Moinnereau, M.-A.; Brienne, T.; Brodeur, S.; Rouat, J.; Whittingstall, K.; Plourde, E. Classification of auditory stimuli from EEG signals with a regulated recurrent neural network reservoir. arXiv 2018, arXiv:1804.10322. [Google Scholar]
  56. Spampinato, C.; Palazzo, S.; Kavasidis, I.; Giordano, D.; Souly, N.; Shah, M. Deep learning human mind for automated visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6809–6817. [Google Scholar]
  57. Dose, H.; Møller, J.S.; Iversen, H.K.; Puthusserypady, S. An end-to-end deep learning approach to MI-EEG signal classification for BCIs. Expert Syst. Appl. 2018, 114, 532–542. [Google Scholar] [CrossRef]
  58. Talathi, S.S. Deep recurrent neural networks for seizure detection and early seizure detection systems. arXiv 2017, arXiv:1706.03283. [Google Scholar]
  59. Kannathal, N.; Choo, M.; Acharya, U.R.; Sadasivan, P. Entropies for detection of epilepsy in EEG. Comput. Methods Programs Biomed. 2005, 80, 187–194. [Google Scholar] [CrossRef]
  60. Golmohammadi, M.; Ziyabari, S.; Shah, V.; Lopez de Diego, S.; Obeid, I.; Picone, J. Deep Architectures for Automated Seizure Detection in Scalp EEGs. arXiv 2017, arXiv:1712.09776. [Google Scholar]
  61. Harati, A.; Lopez, S.; Obeid, I.; Jacobson, M.; Tobochnik, S.; Picone, J. THE TUH EEG CORPUS: A Big Data Resource for Automated EEG Interpretation. In Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium, Philadelphia, PE, USA, 13 December 2014. [Google Scholar]
  62. Ruffini, G.; Ibanez, D.; Castellano, M.; Dunne, S.; Soria-Frisch, A. EEG-driven RNN classification for prognosis of neurodegeneration in at-risk patients. In Proceedings of the International Conference on Artificial Neural Networks, Barcelona, Spain, 6–9 September 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 306–313. [Google Scholar]
  63. Morabito, F.C.; Campolo, M.; Ieracitano, C.; Ebadi, J.M.; Bonanno, L.; Bramanti, A.; Desalvo, S.; Mammone, N.; Bramanti, P. Deep convolutional neural networks for classification of mild cognitive impaired and Alzheimer’s disease patients from scalp EEG recordings. In Proceedings of the 2016 IEEE 2nd International Forum on Research and Technologies for Society and Industry Leveraging a better tomorrow (RTSI), Bologna, Italy, 7–9 September 2016; pp. 1–6. [Google Scholar]
  64. Acharya, U.R.; Oh, S.L.; Hagiwara, Y.; Tan, J.H.; Adeli, H.; Subha, D.P. Automated eeg-based screening of depression using deep convolutional neural network. Methods Programs Biomed. 2018, 161, 103–113. [Google Scholar] [CrossRef]
  65. Sheikhani, A.; Behnam, H.; Mohammadi, M.R.; Noorozian, M. Analysis of EEG background activity in Autism disease patients with bispectrum and STFT measure. In Proceedings of the 11th Conference on 11t WSEAS International Conference on Communications, Madrid, Spain, 22–23 August 2007; pp. 318–322. [Google Scholar]
  66. Jin, Z.; Zhou, G.; Gao, D.; Zhang, Y.L. EEG classification using sparse Bayesian extreme learning machine for brain–computer interface. Neural Comput. Appl. 2018, 1–9. [Google Scholar] [CrossRef]
  67. ADNI Data and Samples. Available online: http://adni.loni.usc.edu/data-samples/access-data/ (accessed on 11 April 2020).
  68. AMIGOS Dataset. Available online: http://www.eecs.qmul.ac.uk/mmv/datasets/amigos/readme.html (accessed on 11 April 2020).
  69. BCI Competitions. Available online: http://www.bbci.de/competition/ (accessed on 11 April 2020).
  70. BCI2000 Wiki. Available online: https://www.bci2000.org/mediawiki/index.php/Main_Page (accessed on 11 April 2020).
  71. CHB-MIT Scalp EEG Database. Available online: http://archive.physionet.org/pn6/chbmit/ (accessed on 11 April 2020).
  72. EEG Resources. Available online: https://www.isip.piconepress.com/projects/tuh_eeg/ (accessed on 11 April 2020).
  73. MICCAI BraTS 2018 Data. Available online: http://www.med.upenn.edu/sbia/brats2018/data.html (accessed on 11 April 2020).
  74. Montreal Archive of Sleep Studies. Available online: http://massdb.herokuapp.com/en/ (accessed on 11 April 2020).
  75. OpenMIIR Dataset. Available online: https://owenlab.uwo.ca/research/the_openmiir_dataset.html (accessed on 11 April 2020).
  76. SHHS Polysomnography Database. Available online: http://archive.physionet.org/pn3/shhpsgdb/ (accessed on 11 April 2020).
  77. Szczuko, P.; Lech, M.; Czyżewski, A. Comparison of Methods for Real and Imaginary Motion Classification from EEG Signals. In Intelligent Methods and Big Data in Industrial Applications; Bembenik, R., Skonieczny, Ł., Protaziuk, G., Krzyszkiewicz, M., Rybinski, H., Eds.; Springer: Berlin/Heidelberg, Germany, 2018; pp. 247–257. [Google Scholar] [CrossRef]
  78. Szczuko, P.; Lech, M.; Czyżewski, A. Comparison of Classification Methods for EEG Signals of Real and Imaginary Motion. In Advances in Feature Selection for Data and Pattern Recognition; Stanczyk, U., Zielosko, B., Jain, L.C., Eds.; Springer: Berlin/Heidelberg, Germany, 2018; pp. 227–239. [Google Scholar] [CrossRef]
  79. Emotiv EPOC±Technical Specifications. Available online: https://emotiv.gitbook.io/epoc-user-manual/introduction-1/technical_specifications (accessed on 12 March 2020).
  80. Jasper, H.H. The Ten-Twenty Electrode System of the International Federation. Electroencephalogr. Clin. Neurophysiol. 1958, 10, 371–375. [Google Scholar] [CrossRef]
  81. Nuwer, M.R.; Comi, G.; Emerson, R.; Fuglsang-Frederiksen, A.; Guérit, J.M.; Hinrichs, H.; Ikeda, A.; Luccas, F.J.; Rappelsburger, P. IFCN standards for digital recording of clinical EEG. Electroencephalogr. Clin. Neurophysiol. 1999, 106, 259–261. [Google Scholar] [CrossRef]
  82. Gwizdka, J.; Hosseini, R.; Cole, M.; Wang, S. Temporal dynamics of eye-tracking and EEG during reading and relevance decisions. J. Assoc. Inf. Sci. Techol. 2017, 68. [Google Scholar] [CrossRef]
  83. Joseph, P.; Kannathal, N.; Acharya, U.R. Complex Encephalogram Dynamics during Meditation. J. Chin. Clin. Med. 2007, 2, 220–230. [Google Scholar]
  84. Pizarro, J.; Guerrero, E.; Galindo, P.L. Multiple comparison procedures applied to model selection. Neurocomputing 2002, 48, 155–173. [Google Scholar] [CrossRef] [Green Version]
  85. Oliphant, T.E. Python for scientific computing. Comput. Sci. Eng. 2007, 9, 10–20. [Google Scholar] [CrossRef] [Green Version]
  86. Keras Documentation. Available online: https://keras.io/ (accessed on 12 March 2020).
  87. scikit-Learn Documentation. Available online: https://scikit-learn.org/stable/documentation.html (accessed on 12 March 2020).
  88. TensorFlow Guide. Available online: https://www.tensorflow.org/guide (accessed on 12 March 2020).
  89. Beyer, K.; Goldstein, J.; Ramakrishnan, R.; Shaft, U. When Is Nearest Neighbor Meaningful? In Proceedings of the 7th International Conference on Database Theory (ICDT), Jerusalem, Israel, 10–12 January 1999; pp. 217–235, ISBN 3-540-65452-6. [Google Scholar]
  90. Pestov, V. Is the k-NN classifier in high dimensions affected by the curse of dimensionality? arXiv 2012, arXiv:1110.4347. [Google Scholar] [CrossRef]
  91. Galecki, A.; Burzykowski, T. Linear Mixed-Effects Models Using, R. A Step-by-Step Approach. In Springer Texts in Statistics; Casella, G., Fienberg, S.E., Olkin, I., Eds.; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar] [CrossRef]
  92. Online Documentation for the Statsmodels Method Used for Calculation of MLM-Based Statistical Tests. Available online: https://www.statsmodels.org/devel/mixed_glm.html (accessed on 12 March 2020).
  93. Signorell, A.; Aho, K.; Alfons, A.; Anderegg, N.; Aragon, T.; Arppe, A. DescTools: Tools for Descriptive Statistics. R Package Version 0.99.34. 2020. Available online: https://cran.r-project.org/package=DescTools (accessed on 9 April 2020).
  94. Bengio, Y.; Glorot, X.; Bordes, A. Deep Sparse Rectifier Neural Networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, FL, USA, 11–13 April 2011. [Google Scholar]
  95. Hinton, G.E.; Nair, V. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
  96. He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015), Santiago, Chile, 7–13 December 2015. [Google Scholar] [CrossRef] [Green Version]
  97. Sutskever, I.; Martens, J.; Dahl, G.; Hinton, G. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, GA, USA, 17–19 June 2013; pp. 1139–1147. [Google Scholar]
  98. Demšar, J. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
  99. Garcia, S.; Herrera, F. An Extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all Pairwise Comparisons. J. Mach. Learn. Res. 2008, 9, 2677–2694. [Google Scholar]
  100. Haselsteiner, E.; Pfurtscheller, G. Using time-dependent neural networks for EEG classification. IEEE Trans. Rehabil. Eng. 2000, 8, 457–463. [Google Scholar] [CrossRef] [Green Version]
  101. Ziyabari, S.; Shah, V.; Golmohammadi, M.; Obeid, I.; Picone, J. Objective evaluation metrics for automatic classification of EEG events. arXiv 2017, arXiv:1712.10107. [Google Scholar]
  102. Lu, H.; Wang, M.; Yu, H. EEG Model and Location in Brain when Enjoying Music. In Proceedings of the 27th Annual IEEE Engineering in Medicine and Biology Conference, Shanghai, China, 1–4 September 2005; pp. 2695–2698. [Google Scholar]
  103. Han, J.; Kamber, M.; Jian, P. Data Mining: Concepts and Techniques, Morgan Kaufmann; Elsevier: Amsterdam, The Netherlands, 2012. [Google Scholar] [CrossRef]
Figure 1. The flowchart of the study performed.
Figure 1. The flowchart of the study performed.
Sensors 20 02403 g001
Figure 2. First (x-axis) and second (y-axis) principal component of the training dataset parametrized with the welch32 scheme.
Figure 2. First (x-axis) and second (y-axis) principal component of the training dataset parametrized with the welch32 scheme.
Sensors 20 02403 g002
Figure 3. First (x-axis) and second (y-axis) principal component of the test dataset parametrized with the welch32 scheme.
Figure 3. First (x-axis) and second (y-axis) principal component of the test dataset parametrized with the welch32 scheme.
Sensors 20 02403 g003
Table 1. Examples of classification performance obtained for various tasks based on selected literature sources.
Table 1. Examples of classification performance obtained for various tasks based on selected literature sources.
EEG-Related TaskLiterature SourceAlgorithmDatasetClassification Effectiveness
event-related potential[46]SVM, SWLDA, BLDA, SBL, SBLaplacetwo experimental datasetsthe best approach—approximately up to 100%
fatigue[7]spatial-temporal convolutional neural network (ESTCNN)experimental, local dataset97.3%
stress[47]DNN and deep CNNexperimental, local dataset86.62
emotion[48]CNNDEAP [49]99.72%
emotion[50]dynamical graph CNN (DGCNN)SEED [51]90.4%
emotion[52]RNN with LSTM (Recurrent Neural Networks/Long Short-Term Memory SSVEP (steady-state visually
evoked potentials)
93.0%
temporal analysis[50]dynamical graph CNN (DGCNN)DREAMER [53] 86.23%
sleep disturbance detection[54]CNN (no feature extraction)[54]93.55% to 98.10% depending on the number of classess
auditory stimulus classification[55]RNNexperimental, local dataset83.2%
automated visual object categorization[56]RNN, CNN-based regressorexperimental, local dataset83%
MI (Motor Imaginery) EEG[57] CNN, transfer learning[57] two classes: 86.49%,
three classes: 79.25%,
four classes: 68.51%
epileptic seizure detection[58]Gated Recurrent Unit RNNBUD [58]98%
epileptic seizure detection[59]Neuro-fuzzyLocal (EEG database—Bonn University) [59]~90%
epileptic seizure detection[60]CNNs/LSTMTUH EEG Seizure Corpus [61]/Duke University Seizure Corpussensitivity: 0.3083;
specificity: 0.9686
Behavioral Disorder (RBD)[62]Echo State Networks (ESNs)experimental, local dataset (118 subjects)85%
Alzheimer disease detection[63]multiple convolutional-subsamplingexperimental, local dataset80%
depression screening[64]CNNexperimental, local dataset (patients with Mild Cognitive Impairment and healthy control group)left hemisphere: 93.5%
right hemisphere: 96%
autism[65]bispectrum
transform, ST Fourier Transform (STFT)/STFT at a bandwidth of total spectrum (STFT-BW)
experimental, local dataset (10 autism patients and 7 control subjects)82.4%
Table 2. Accuracy of test data classification with k-NN classifiers for chosen values of k.
Table 2. Accuracy of test data classification with k-NN classifiers for chosen values of k.
Feature Extraction Scheme
kar16ar24dwtdwt statwelch16welch32welch64mean
50.47420.46050.35290.41920.59990.63040.60520.5060
70.48910.47560.35590.43270.60840.63380.61450.5010
110.49410.48750.35350.44030.61630.63860.62450.5157
140.50300.49270.35330.44560.61410.63700.63580.5259
170.50660.49980.35630.45690.61290.63620.63220.5287
mean0.49340.48320.35440.43890.61030.63520.6224
Table 3. Results of the mixed linear model analysis for data from Table 2. The values presented are coefficients of a linear model calculated by the analysis procedure, standard error, statistic, and p-value of a test for statistical significance and left and right boundaries of the confidence interval for the influence of each algorithm in comparison to reference algorithm (welch32). Boundary probabilities of the confidence interval are 0.025 and 0.975.
Table 3. Results of the mixed linear model analysis for data from Table 2. The values presented are coefficients of a linear model calculated by the analysis procedure, standard error, statistic, and p-value of a test for statistical significance and left and right boundaries of the confidence interval for the influence of each algorithm in comparison to reference algorithm (welch32). Boundary probabilities of the confidence interval are 0.025 and 0.975.
Coeff.Std. Err.zP > |z|Left c.f. BoundaryRight c.f. Boundary
Intercept (welch32-based influence)0.6350.01253.6820.0000.6120.658
ar16−0.1420.017−8.4740.000−0.175−0.109
ar24−0.1520.017−9.0820.000−0.185−0.119
dwt−0.2810.017−16.7820.000−0.314−0.248
dwt_stat−0.1960.017−11.7280.000−0.229−0.163
welch16−0.0250.017−1.4870.137−0.0580.008
welch64−0.0130.017−0.7630.446−0.0460.020
Table 4. Normalized confusion matrix for the 11-NN classifier and the welch32 feature extraction scheme (left). Normalized confusion matrix for the 17-NN classifier and the ar16 feature extraction scheme (right).
Table 4. Normalized confusion matrix for the 11-NN classifier and the welch32 feature extraction scheme (left). Normalized confusion matrix for the 17-NN classifier and the ar16 feature extraction scheme (right).
Confusion Matrix for 11-NN welch32 Confusion Matrix for 17-NN ar16
MeditationMusic VideoLogic Game MeditationMusic VideoLogic Game
meditation0.820.090.08 meditation0.720.190.10
music video0.110.470.42 music video0.430.320.25
logic game0.040.340.62 logic game0.270.250.48
Table 5. Normalized confusion matrix for the 17-NN classifier and the dwt feature extraction scheme (left). Normalized confusion matrix for the 17-NN classifier and the dwt_stat feature extraction scheme (right).
Table 5. Normalized confusion matrix for the 17-NN classifier and the dwt feature extraction scheme (left). Normalized confusion matrix for the 17-NN classifier and the dwt_stat feature extraction scheme (right).
Confusion Matrix for 17-NN dwt Confusion Matrix for 17-NN dwt_stat
MeditationMusic VideoLogic Game MeditationMusic VideoLogic Game
meditation0.090.260.64 meditation0.730.140.13
music video0.080.260.67 music video0.430.290.28
logic game0.050.230.72 logic game0.380.280.35
Table 6. Values of precision, recall, and F1 score for each signal class for chosen variants of Experiment 1.
Table 6. Values of precision, recall, and F1 score for each signal class for chosen variants of Experiment 1.
ScenarioClassPrecisionRecallF1
11-NN welch32meditation0.84970.82340.8363
logic game0.55210.62150.5847
music video0.52040.47100.4944
17-NN ar16meditation0.50490.71770.5928
logic game 0.58240.48070.5267
music video0.42700.32160.3669
17-NN dwtmeditation0.42740.09250.1521
logic game 0.35450.72010.4751
music video0.34080.25630.2926
17-NN dwt statmeditation0.47720.73340.5782
logic game 0.45930.34760.3957
music video0.41010.28960.3395
Table 7. Values of precision, recall, and F1 score in 10-fold cross-validation for the best and the worst feature extraction method variants of Experiment 1 (k-NN classifier).
Table 7. Values of precision, recall, and F1 score in 10-fold cross-validation for the best and the worst feature extraction method variants of Experiment 1 (k-NN classifier).
ScenarioClassPrecisionRecallF1
11-NN welch32meditation0.86210.83000.8458
logic game0.57940.56210.5706
music video0.50180.53540.5180
17-NN dwtmeditation0.44020.09570.1572
logic game0.35870.60980.4517
music video0.33700.36490.3504
Table 8. Accuracy of test data classification with support vector machine (SVM)-linear classifier for chosen values of C parameter.
Table 8. Accuracy of test data classification with support vector machine (SVM)-linear classifier for chosen values of C parameter.
Feature Extraction Scheme
Car16ar24dwtdwt statwelch16welch32welch64mean
0.010.50720.53970.33530.51490.56530.61220.62900.5291
0.10.50830.53970.33510.51290.60700.63780.65280.5491
10.50850.53930.32870.51310.62490.66710.66120.5490
100.50850.5395-0.51450.65500.66280.65980.5900
1000.50850.5397--0.65480.66380.65480.6043
mean0.50820.53960.33300.51380.62140.64870.6515
Table 9. Coefficients of a linear model calculated by the analysis procedure, standard error, statistic, and p-value of a test for statistical significance as well as left and right boundaries of the confidence interval for the influence of each algorithm in comparison to the reference algorithm (welch64). Boundary probabilities of the confidence interval are 0.025 and 0.975.
Table 9. Coefficients of a linear model calculated by the analysis procedure, standard error, statistic, and p-value of a test for statistical significance as well as left and right boundaries of the confidence interval for the influence of each algorithm in comparison to the reference algorithm (welch64). Boundary probabilities of the confidence interval are 0.025 and 0.975.
Coeff.Std. Err.zP > |z|Left c.f. BoundaryRight c.f. Boundary
Intercept (welch64-based influence)0.6520.01253.7640.0000.6280.675
ar16−0.1430.017−8.2910.000−0.177−0.109
ar24−0.1120.019−5.9650.000−0.149−0.075
dwt−0.3180.022−14.3860.000−0.362−0.275
dwt_stat−0.1380.018−7.6800.000−0.173−0.103
welch16−0.0300.026−1.1440.253−0.0820.021
welch32−0.0030.024−0.1180.906−0.0490.043
Table 10. Accuracy values for the case of SVM (linear kernel) in 10-fold cross-validation.
Table 10. Accuracy values for the case of SVM (linear kernel) in 10-fold cross-validation.
Feature Extraction Scheme
Cdwtwelch32
0.010.3330-
1-0.6595
Table 11. Normalized confusion matrix for SVM classifier with linear kernel, value of C = 1, and welch32 feature extraction scheme (left). Normalized confusion matrix for SVM classifier with linear kernel, value of C = 1, and ar16 feature extraction scheme (right).
Table 11. Normalized confusion matrix for SVM classifier with linear kernel, value of C = 1, and welch32 feature extraction scheme (left). Normalized confusion matrix for SVM classifier with linear kernel, value of C = 1, and ar16 feature extraction scheme (right).
MeditationMusic VideoLogic Game MeditationMusic VideoLogic Game
meditation0.850.120.03 meditation0.730.150.12
music video0.130.500.37 music video0.400.280.31
logic game0.040.310.66 logic game0.270.220.51
Table 12. Normalized confusion matrix for SVM classifier with linear kernel, value of C = 0.01, and dwt feature extraction scheme (left). Normalized confusion matrix for SVM classifier with linear kernel, value of C = 0.01, and dwt_stat feature extraction scheme (right).
Table 12. Normalized confusion matrix for SVM classifier with linear kernel, value of C = 0.01, and dwt feature extraction scheme (left). Normalized confusion matrix for SVM classifier with linear kernel, value of C = 0.01, and dwt_stat feature extraction scheme (right).
MeditationMusic VideoLogic Game MeditationMusic VideoLogic Game
meditation0.300.350.35 meditation0.680.200.13
music video0.300.350.35 music video0.250.370.38
logic game0.310.340.35 logic game0.180.320.50
Table 13. Values of precision, recall, and F1 score for each signal class for chosen variants of Experiment 2.
Table 13. Values of precision, recall, and F1 score for each signal class for chosen variants of Experiment 2.
VariantClassPrecisionRecallF1
welch32
C = 1
meditation0.83690.85010.8434
logic game 0.61790.65600.6364
music video0.53670.49520.5151
ar16
C = 1
meditation0.52090.73100.6083
logic game0.54000.51030.5247
music video0.43600.28420.3441
dwt
C = 0.01
meditation0.33240.30170.3163
logic game0.33450.35250.3432
music video0.33880.35190.3452
dwt stat
C = 0.01
meditation0.61340.67530.6429
logic game0.49460.50000.4973
music video0.41590.36940.3913
Table 14. Values of precision, recall, and F1 score for 10-fold cross-validation for the best and worst results resulted from the training/validation/test scheme as contained in Table 13.
Table 14. Values of precision, recall, and F1 score for 10-fold cross-validation for the best and worst results resulted from the training/validation/test scheme as contained in Table 13.
VariantClassPrecisionRecallF1
welch32
C = 1
meditation0.84720.85940.8533
logic game0.60520.62460.6147
music video0.51870.49460.5063
dwt
C = 0.01
meditation0.32880.31340.3209
logic game0.33440.33940.3369
music video0.33560.34630.3408
Table 15. Accuracy of test data classification with the SVM-RBF classifier for chosen values of C and γ parameters.
Table 15. Accuracy of test data classification with the SVM-RBF classifier for chosen values of C and γ parameters.
Feature Extraction Scheme
Cγar16ar24dwtdwt statwelch16welch32welch64mean
0.010.10.37690.33920.40970.39760.56510.61330.62310.4750
10.33330.33330.34140.35230.56890.61930.63320.4545
100.42520.41800.35570.33330.60720.63780.63800.4879
0.10.10.48210.37340.40970.39760.56510.61690.63100.4965
10.33330.33330.33920.35290.61610.64530.65780.4683
100.42520.41780.35570.33330.63340.66830.67130.5007
10.10.52860.51070.42220.33330.61570.64550.66040.5309
10.35970.33330.43190.35350.63380.66550.67090.4927
100.33330.41780.35570.33330.65780.68460.68660.4956
100.10.49980.5070-0.35350.63340.66500.66260.5535
10.36360.3366-0.33330.65780.66810.67650.5060
100.33330.4210--0.66320.69330.66440.5550
1000.10.5000--0.33270.61330.66320.66260.5544
10.3636--0.35350.65480.68200.67250.5453
100.3333--0.33330.66830.67010.66060.5331
mean 0.39940.39510.38010.34950.62360.65590.6581
Table 16. Coefficients of a linear model calculated by the analysis procedure, standard error, statistic, and p-value of a test for statistical significance and left and right boundaries of the confidence interval for the influence of each algorithm in comparison to the reference algorithm (welch64). Boundary probabilities of the confidence interval (c.f. ) are 0.025 and 0.975.
Table 16. Coefficients of a linear model calculated by the analysis procedure, standard error, statistic, and p-value of a test for statistical significance and left and right boundaries of the confidence interval for the influence of each algorithm in comparison to the reference algorithm (welch64). Boundary probabilities of the confidence interval (c.f. ) are 0.025 and 0.975.
Coeff.Std. Err.zP > |z|Left c.f. BoundaryRight c.f. Boundary
Intercept (welch6-based influence)0.6580.04215.6390.0000.5760.741
ar16−0.2590.051−5.0490.000−0.359−0.158
ar24−0.2630.053−4.9770.000−0.367−0.159
dwt−0.2780.062−4.4800.000−0.400−0.156
dwt_stat−0.3090.062−4.9460.000−0.431−0.186
welch16−0.0350.064−0.5420.588−0.1590.090
welch32−0.0020.063−0.0350.972−0.1250.121
Table 17. Accuracy of test data classification for the SVM-RBF classifier for 10-fold cross-validation performed for the best and worst results obtained from the training/validation/test scheme.
Table 17. Accuracy of test data classification for the SVM-RBF classifier for 10-fold cross-validation performed for the best and worst results obtained from the training/validation/test scheme.
Feature Extraction Scheme
Cγdwt_statwelch32
0.010.10.3229-
1010-0.6905
Table 18. Normalized confusion matrix for SVM classifier with RBF kernel, C = 10, γ = 10, and the welch32 feature extraction scheme (left). Normalized confusion matrix for SVM classifier with RBF kernel, C = 1, γ = 0.1, and the ar16 feature extraction scheme (right).
Table 18. Normalized confusion matrix for SVM classifier with RBF kernel, C = 10, γ = 10, and the welch32 feature extraction scheme (left). Normalized confusion matrix for SVM classifier with RBF kernel, C = 1, γ = 0.1, and the ar16 feature extraction scheme (right).
MeditationMusic VideoLogic Game MeditationMusic VideoLogic Game
meditation0.860.110.03 meditation0.620.240.14
music video0.100.530.37 music video0.280.370.35
logic game0.010.300.69 logic game0.180.220.59
Table 19. Normalized confusion matrix for SVM classifier with RBF kernel, C = 1, γ = 1, and the dwt feature extraction scheme (left). Normalized confusion matrix for SVM classifier with RBF kernel, C = 0.01, γ = 0.1, and the dwt_stat feature extraction scheme (right).
Table 19. Normalized confusion matrix for SVM classifier with RBF kernel, C = 1, γ = 1, and the dwt feature extraction scheme (left). Normalized confusion matrix for SVM classifier with RBF kernel, C = 0.01, γ = 0.1, and the dwt_stat feature extraction scheme (right).
MeditationMusic VideoLogic Game MeditationMusic VideoLogic Game
meditation0.700.160.14 meditation0.520.260.22
music video0.510.230.26 music video0.370.340.29
logic game0.400.230.37 logic game0.350.320.33
Table 20. Values of precision, recall, and F1 score for each signal class for chosen variants of Experiment 3.
Table 20. Values of precision, recall, and F1 score for each signal class for chosen variants of Experiment 3.
VariantClassPrecisionRecallF1
welch32meditation0.88760.85970.8735
C = 10logic game 0.62870.68980.6578
γ = 10music video0.56760.53020.5483
ar16meditation0.57160.62030.5950
C = 1logic game 0.54960.59310.5705
γ = 0.1music video0.44570.37240.4058
dwtmeditation0.43440.69830.5356
C = 1logic game 0.47910.36640.4152
γ = 1music video0.36800.23100.2838
dwt_statmeditation0.41780.51930.4631
C = 0.01logic game 0.39970.33370.3638
γ = 0.1music video0.36850.33980.3536
Table 21. Values of precision, recall, and F1 score for 10-fold cross-validation for the best and worst results of the training/validation/test scheme as contained in Table 20.
Table 21. Values of precision, recall, and F1 score for 10-fold cross-validation for the best and worst results of the training/validation/test scheme as contained in Table 20.
VariantClassPrecisionRecallF1
welch32meditation0.8848 0.87220.8785
C = 10logic game 0.6269 0.66650.6461
γ = 10music video0.56010.53260.5460
dwt_statmeditation0.3271 0.3925 0.3568
C = 0.01logic game0.32110.3854 0.3503
γ = 0.1music video0.31820.19090.2387
Table 22. Accuracy of test data classification using the neural network with a single hidden layer. The evaluation was repeated 10 times for each parameterization method.
Table 22. Accuracy of test data classification using the neural network with a single hidden layer. The evaluation was repeated 10 times for each parameterization method.
ar16ar24dwtdwt statwelch16welch32welch64
0.51490.52960.34280.49110.67050.70310.6894
0.51910.53390.33130.49860.67210.70480.6961
0.51310.52660.33860.49620.67130.70460.6941
0.51630.52400.34000.48470.67630.69630.6913
0.52660.53470.34240.49740.66530.70580.6894
0.52080.53410.34240.47360.67550.70030.6955
0.51550.52360.34340.49420.67170.69970.6904
0.51690.53530.34280.49230.66260.70350.6870
Table 23. Result of the Dunn post hoc test in the form of the p-value matrix. Values indicating no statistically significant values are marked in bold font.
Table 23. Result of the Dunn post hoc test in the form of the p-value matrix. Values indicating no statistically significant values are marked in bold font.
ar16ar24dwtwelch16welch32welch64dwt_stat
ar_16 0.3070.0250.031<10−30.0010.255
ar_240.307 0.0010.255<10−30.0250.0301
dwt0.0250.001 <10−3<10−3<10−30.272
welch160.0310.255<10−3 0.0280.272<10−3
welch32<10−3<10−3<10−30.028 0.272<10−3
welch640.0010.025<10-30.2720.272 <10−3
dwt_stat0.2550.0310.272<10−3<10−3<10−3
Table 24. Specifications of neural networks for which the highest values of classification accuracy were achieved.
Table 24. Specifications of neural networks for which the highest values of classification accuracy were achieved.
Hidden
Layers
Activation
Function
SGD
Parameters
PatienceMax EpochsAccuracy
3LReLU
(a = 0.2)
lr = 0.01
decay = 10−6
momentum = 0.9
5020000.7477
4tanh +
LReLU
(a = 0.2)
lr = 0.005
decay = 10−6
momentum = 0.9
25030000.7469
6ReLUlr = 0.01
decay = 10−6
momentum = 0.9
7020000.7467
3tanhlr = 0.01
decay = 10−6
momentum = 0.9
25030000.7446
Table 25. Normalized confusion matrix for NN with three hidden layers, LeakyReLU activation function, and the welch32 feature extraction scheme (on the left side: training/validation/test scheme is shown, whether the outcomes of 10-fold cross-validation are contained on the right side).
Table 25. Normalized confusion matrix for NN with three hidden layers, LeakyReLU activation function, and the welch32 feature extraction scheme (on the left side: training/validation/test scheme is shown, whether the outcomes of 10-fold cross-validation are contained on the right side).
Training/Validation/TestMeditationMusic
Video
Logic Game10-Fold Cross-ValidationMeditationMusic
Video
Logic Game
meditation0.870.110.02meditation0.900.080.02
music video0.060.690.26music video0.070.630.29
logic game0.020.300.68logic game0.020.290.69
Table 26. Values of precision, recall, and F1 score for each class for NN with three hidden layers with the LeakyReLU activation function and the welch32 feature extraction scheme.
Table 26. Values of precision, recall, and F1 score for each class for NN with three hidden layers with the LeakyReLU activation function and the welch32 feature extraction scheme.
ClassPrecisionRecallF1 Score
meditation0.92030.87240.8957
logic game0.71160.68320.6971
music video0.62960.68740.6572
Table 27. Values of precision, recall, and F1 score for each activity class for NN with three hidden layers with the LeakyReLU activation function and the welch32 feature extraction scheme (10-fold cross-validation).
Table 27. Values of precision, recall, and F1 score for each activity class for NN with three hidden layers with the LeakyReLU activation function and the welch32 feature extraction scheme (10-fold cross-validation).
ClassPrecisionRecallF1 Score
meditation0.90790.89730.9026
logic game0.68400.69330.6886
music video0.63430.63320.6337

Share and Cite

MDPI and ACS Style

Browarczyk, J.; Kurowski, A.; Kostek, B. Analyzing the Effectiveness of the Brain–Computer Interface for Task Discerning Based on Machine Learning. Sensors 2020, 20, 2403. https://doi.org/10.3390/s20082403

AMA Style

Browarczyk J, Kurowski A, Kostek B. Analyzing the Effectiveness of the Brain–Computer Interface for Task Discerning Based on Machine Learning. Sensors. 2020; 20(8):2403. https://doi.org/10.3390/s20082403

Chicago/Turabian Style

Browarczyk, Jakub, Adam Kurowski, and Bozena Kostek. 2020. "Analyzing the Effectiveness of the Brain–Computer Interface for Task Discerning Based on Machine Learning" Sensors 20, no. 8: 2403. https://doi.org/10.3390/s20082403

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop