2.5.4. Regularized

Regularization consists of carefully controlling classifier complexity to prevent overtraining. These classifiers have excellent generalization performance. Regularized's Fisher LDA (RF-LDA), linear SVM, and radial basis function kernel for support vector machine (RBF-SVM) are examples of regularized classifiers.

#### 2.5.5. General Taxonomy of Classification Algorithms

Another taxonomy divides classifiers using their properties to distinguish them into general types of algorithms as linear, neural networks, nonlinear Bayesian, nearest neighbor classifiers, and combinations of systems (ensemble). Most of the more specialized algorithms can be generated from these general types. Table 6 shows this taxonomy criterion with five different categories of general classifiers: (1) Linear, (2) neural networks, (3) nonlinear Bayesian, (4) nearest neighbor classifiers, and (5) combinations of classifiers or ensemble [44,56,58].

All general classifiers have characteristics of each of the previously mentioned framework models. For instance, SVM is discriminant, static, stable, and regularized; HMM is generative, dynamic, unstable, and not regularized; and kNN is discriminant, static, stable, and not regularized.

Consequently, the suggested guidelines for classifier selection are also applicable in this categorization. Table 6 presents the usage statistics of these classifiers in the 2015–2020 literature. The following are the most noteworthy classifiers: Neural networks CNN (46.16%), Linear classifiers SVM (30.3%), and LDA (5.5%), Nearest Neighbors kNN (4.5%), and Ensembled classifier AdaBoost (3.9%).


*Sensors*

**Table 6.** Categories of general classifiers.

**2020**, *20*, 5083

#### *2.6. Performance Evaluation*

Results must be reported consistently so that di fferent research groups can understand and compare them. Hence, evaluation procedures need to be chosen and described accurately [119]. The evaluation of the classifier's execution involves addressing performance measures, error estimation, and statistical significance testing [120]. Performance measures and error estimation configure the fulfillment rate of the classifier's function. The most recommended performance evaluation measures are shown in Table 7. They are confusion matrix, accuracy, error rating, and other measures obtained from the confusion matrix, such as the recall, specificity, precision, Area Under the Curve (AUC), and F-measure. Other performance evaluation coe fficients are Cohen's kappa (k) [121], information transfer rate (ITR) [65], and written symbol rate (WSR) [121].

Performance evaluation and error estimation may need to be complemented with a significance evaluation. This is because high accuracies can be of little impact if the sample size is too small, or classes are imbalanced (labeled EEG signals typically are). Therefore, significance classification is essential. There are general approaches that can handle arbitrary class distributions to verify accuracy values that lie significantly above certain levels. Used methods are the theoretical level of random classification and adjusted Wald confidence interval for classification accuracy.

The theoretical level of random classification test classification results for randomness is the sum of the products between the experimental results' classification probability and the probability calculated if all the categorization randomly occurs (p0 = classification accuracy of a random classifier). This approach can only be used after the classification has been performed [122].

Adjusted Wald confidence interval gives the lower and upper confidence limits for the probability of the correct classification, which specifies the intervals for the classifier performance evaluation index [123].





#### **3. Literature Review of BCI Systems that Estimate Emotional States**

In recent years, several research papers have been published in emotion recognition using BCI devices for data capture. Such publications use different models and strategies that produce a wide range of frameworks. Table 8 offers a summary of the research in this field from 2015 to 2020.

The following components characterize the systems presented in Table 8: (1) Stimulus type; (2) databases, generated by the paper's authors or publicly available; (3) the number of participants; (4) extraction and selection of characteristics; (5) features; (6) classification algorithms; (7) number and types of classes; and (8) performance evaluation.

The applied preprocessing methods are mostly similar in the reviewed studies. Their primary preprocessing methods are standard, so this information was omitted in Table 8.

#### *3.1. Emotion Elicitation Methods*

This article analyzes research papers that used different resources to provoke emotions in their subjects. These stimuli are music videos, film clips, music tracks, self-induced disgust (produced by remembering an unpleasant odor), and risky situations in a flight simulator as an example of active elicitation of emotions. EEG-based BCI systems frequently use the public DEAP and SEED databases that apply music videos and film clips as stimuli, respectively. Different stimuli provoke emotions that affect different areas of the brain and produce EEG signals that can be recognized concerning specific emotions. Figure 5 shows the frequency in which different emotion elicitation methods are applied to generate datasets used in the reviewed systems.

**Figure 5.** Emotion elicitation methods.

Few research papers resort to more elaborate platforms to provoke "real life" emotions. However, such methods have been applied to other physiological responses (other than EEG like skin conductance, respiration, electrocardiogram (ECG), facial expressions, among others) [124]. Some authors state that stimuli that provoke wide-ranging emotions could make it challenging to explore the brain's mechanisms activated for specific emotion generation. In this sense, focusing on a particular emotion could improve our understanding of such mechanisms. For our research sample, we highlighted research pieces that study emotions, such as dislike, and disgust separately [37,125].



*Sensors* **2020**, *20*, 5083


**Table 8.** *Cont*.

#### *Sensors* **2020**, *20*, 5083



selection and weighting method (SFEW). Fractal dimensions (FD). Genetic Algorithm (GA). Graph regularized Extreme Learning Machine (GELM) NN. Graph Regularized Sparse Linear Regularized (GRSLR). High Order Crossing (HOC). Linear Discriminant Analysis (LDA). Logistic Regression (LR). Long short-term memory Recurrent Neural Network (LSTM RNN). Minimum-Redundancy-Maximum-Relevance (mRMR). Normalized Mutual Information (NMI). Principal Component Analysis (PCA). Radial Basis Function (RBF). Short-Time Fourier

Transform (STFT). Stepwise Discriminant Analysis (SDA). Support Vector Machine (SVM). Wavelet Transform (WT).

#### *3.2. Number of Participants to Generate the System Dataset*

Figure 6 presents the number of participants in the experiments to obtain EEG datasets to train and test the emotion recognition systems. Most of the systems used a number of subjects in a range from 31–40 (53%), and 11–20 (31%). The targeted studies used EEG data from healthy individuals.

**Figure 6.** Number of participants in EEG datasets.
