**EEG-Based Emotion Recognition Using Logistic Regression with Gaussian Kernel and Laplacian Prior and Investigation of Critical Frequency Bands**

#### **Chao Pan 1,2,3,\* , Cheng Shi <sup>4</sup> , Honglang Mu 3,5, Jie Li <sup>2</sup> and Xinbo Gao 2,6**


Received: 2 February 2020; Accepted: 25 February 2020; Published: 29 February 2020

**Abstract:** Emotion plays a nuclear part in human attention, decision-making, and communication. Electroencephalogram (EEG)-based emotion recognition has developed a lot due to the application of Brain-Computer Interface (BCI) and its effectiveness compared to body expressions and other physiological signals. Despite significant progress in affective computing, emotion recognition is still an unexplored problem. This paper introduced Logistic Regression (LR) with Gaussian kernel and Laplacian prior for EEG-based emotion recognition. The Gaussian kernel enhances the EEG data separability in the transformed space. The Laplacian prior promotes the sparsity of learned LR regressors to avoid over-specification. The LR regressors are optimized using the logistic regression via variable splitting and augmented Lagrangian (LORSAL) algorithm. For simplicity, the introduced method is noted as LORSAL. Experiments were conducted on the dataset for emotion analysis using EEG, physiological and video signals (DEAP). Various spectral features and features by combining electrodes (power spectral density (PSD), differential entropy (DE), differential asymmetry (DASM), rational asymmetry (RASM), and differential caudality (DCAU)) were extracted from different frequency bands (Delta, Theta, Alpha, Beta, Gamma, and Total) with EEG signals. The Naive Bayes (NB), support vector machine (SVM), linear LR with L1-regularization (LR\_L1), linear LR with L2-regularization (LR\_L2) were used for comparison in the binary emotion classification for valence and arousal. LORSAL obtained the best classification accuracies (77.17% and 77.03% for valence and arousal, respectively) on the DE features extracted from total frequency bands. This paper also investigates the critical frequency bands in emotion recognition. The experimental results showed the superiority of Gamma and Beta bands in classifying emotions. It was presented that DE was the most informative and DASM and DCAU had lower computational complexity with relatively ideal accuracies. An analysis of LORSAL and the recently deep learning (DL) methods is included in the discussion. Conclusions and future work are presented in the final section.

**Keywords:** emotion recognition; electroencephalogram (EEG); logistic regression; Gaussian kernel; Laplacian prior; affective computing

#### **1. Introduction**

Affective computing defined by Picard [1] is a multidisciplinary research field that relates to computer science, psychology, neuroscience, and cognitive science. Levenson [2] believed that during natural selection, emotions were preserved for the necessity of rapid response mechanisms when facing different environmental threats. Emotion plays a nuclear role in human behavior, such as perception, attention, decision-making, and communication [3]. Positive emotions contribute to healthy life and efficient work, while negative emotions may result in health problems [4].

Emotion recognition methods include two main categories, according to the methods humans communicate emotions, including body expressions, and physiological signals. Body expressions are physical manifestations and easy to be collected. Theorists argue that each emotion corresponds to its unique somatic response [1]. However, human physical manifestations are easily affected by the user's cultural background and social environment [4]. The physiological signals [3,4] are internal signals, such as electroencephalogram (EEG), electrocardiogram (ECG), heart rate (HR), electromyogram (EMG), and galvanic skin response (GSR). According to Connon's theory [5], the emotion changes are associated with quick responses in physiological signals coordinated by the autonomic nervous systems (ANS). This makes the physiological signals not easily controlled and overcome the shortcomings of body expressions [4]. Physiological signals have been widely applied in many studies for emotion recognition [3,4]. These physiological signals, including ECG and EMG, are still not a direct reaction to emotion changes. According to psychology and neurophysiology, emotion generation and activity have a close relationship with the activity of the cerebral cortex. Thus, EEG signals effectively reflect the brain electrical activity, and have been widely applied in many fields, including cognitive performance prediction [6], mental load analysis [7,8], mental fatigue assessment [9], recommendation system [10] and decoding visual stimuli [11,12].

Recently, the field of EEG-based emotion recognition has attracted a lot of interest, including Brain-Computer Interaction (BCI) systems, basic emotion theories, and machine learning algorithms [13,14]. In machine learning, the definition of the emotion model is necessary to describe the objective function of the algorithms. There are mainly two kinds of models [3], discrete emotion spaces and continuous emotion models. Among these models, the valence-arousal model by Russell [15] has been widely used in emotion recognition for its simplicity to establish assessment criteria. The progress of EEG-based emotion recognition also includes feature extraction, feature selection, dimension reduction, and classification algorithms [13,14]. After the pre-processing of original EEG signals, the current work is to extract and select informative features to enhance the discriminative signal characteristics. Traditionally, feature extraction and selection are based on neuroscience and cognition science [16]. For example, frontal asymmetry in Alpha band power for differentiating valence level has attracted lots of interest in neuroscience research [17]. Besides neuro-scientific assumptions, computation methods in machine learning are also applied for feature extraction and selection in EEG-based emotion recognition [16,18]. Several studies transformed the pre-processed EEG-signal into various analysis domains, including time, frequency, statistical, and spectral domains [19]. It should be noted that only one feature extraction method is not suitable for various applications and BCI systems [19]. Although the most informative EEG features for emotion classification are still being researched, the power features obtained from different bands are widely recognized as the most popular features. In these studies [20–22], power spectral density (PSD) from EEG signals worked well for identifying emotional states. However, feature extraction usually generates high-dimensional and abundant features. Feature selection and dimension reduction are necessary to avoid over-specification and to reduce computational burden [3]. Compared to filter and wrapper methods for feature selection, the dimension reduction methods, e.g., principal component analysis (PCA), and Fisher linear discriminant (FLD), are more efficient. For further information about feature selection and dimension reduction, we refer the reader to [23,24]. Many machine learning algorithms have been introduced as EEG-based emotion classifiers, such as support vector machine (SVM) [25,26], Naive Bayes (NB) [27], K-nearest neighbors (KNN), linear discriminant analysis (LDA), random forest (RF), and artificial neural networks (ANN). Among these methods, SVM based on spectral features, e.g., PSD, is the most widely applied approach. In [25], SVM was used to classify the joy, sadness, anger, and pleasure feelings based on the EEG signals from 12 symmetric electrodes pairs. SVM was used in [26] for emotion recognition with the accuracies 32% and 37% in valence and arousal dimensions, respectively. A Gaussian NB in [27] was used to classify low/high valence, and arousal emotion with precision of 57.6% and 62.0%, respectively.

Recently, deep learning (DL) methods have been introduced for EEG-based emotion classification [28,29]. The study [30,31] proposed deep belief network (DBN) to discriminate positive, neutral, and negative emotions. The experimental results show that DBN performs better than SVM and KNN. In [32], after an effective pre-processing method instead of traditional feature extraction methods, a hybrid neural network combining convolutional neural network (CNN) and recurrent neural network (RNN) is proposed to learn spatial-temporal representation from the pre-processed EEG recordings. The proposed pre-processing strategy improves the emotion recognition accuracies by about 33% and 30% for valence and arousal dimensions, respectively. In [33], a deep CNN (DCNN) model is introduced to learn discriminative representations from the combined features in the raw time domain, after normalization, and in the frequency domain. The obtained emotion classification accuracies are higher than the traditionally best bagging tree (BT) classifier. The study [34] proposed a hierarchical bidirectional gated recurrent unit (GRU) network with an attention mechanism. The proposed scheme learned more significant representation from EEG sequences and the accuracies obtained on cross-subject emotion classification task outperformed the long short time memory (LSTM) network by 4.2% and 4.6% in valence and arousal dimensions, respectively. Compared to traditional shallow methodologies, the DL models remove the signal pre-processing and feature extraction/selection progress, and are more suitable for affective representation [35,36]. However, the DL methods cannot reveal the relationship between emotional states and EEG signals for being like a black box [37]. Moreover, the training of DL networks is extremely computationally time-consuming, which limits their practical applications in real-time emotion recognition [3].

As aforementioned, the field of affective computing has developed a lot over the past several years, including the incorporation of DL methodologies. However, the modeling and recognition of emotional states is still an unexplored problem [13,14]. EEG-based emotion recognition is still faced with several challenges, including fuzzy boundaries between emotions.

Note that logistic regression (LR) [38] has been widely used as a statistical learning model in pattern recognition and machine learning, as well as in EEG signal processing. In [39], LR trained with EEG power spectral features was used for automatic epilepsy diagnosis. The work in [40] further used wavelet transform to extract effective representation from non-stationary EEG records and adopted LR as a classifier to identify epileptic and non-epileptic seizures. In [41], regularized linear LR was trained using the raw EEG signal without feature extraction to classify imaginary movements. In [42], LR with L2-penalization to avoid overfitting was trained using spectral power features from intracranial EEG (iEEG) signals for the analysis of the brain's encoding states and memory performance. The study in [43] further incorporated t-distributed stochastic neighbor embedding (tSNE) for dimension reduction of iEEG signals, and the learned L2-regularized LR classier was used for predicting memory encoding success. Despite the above studies, the potential of the LR model for EEG-based emotion recognition is still not fully explored.

In this present study, we systematically introduced the logistic regression (LR) algorithm with Gaussian kernel and Laplacian prior [44–46] for EEG-based emotion recognition. Different from these LR classifiers, Gaussian radial basis function (RBF) kernel was used to enhance the data separability in the transformed space [46]. Moreover, Laplacian prior promoting the sparsity of logistic regressors was acted as L1-regularization [44]. This prior forces many components of logistic regressors to be zero. Thus, the learned logistic regressors with sparseness control the complexity of the LR classifier and consequently avoids over-specification in EEG-based emotion recognition. The logistic regression via variable splitting and augmented Lagrangian (LORSAL) algorithm [45] was introduced to optimize the

logistic regressors for lower computational complexity. Thus, the introduced LR method is abbreviated as LORSAL. For overall evaluation of the LORSAL classifier, various power spectral features and features calculated by combinations of electrodes were used as input for the classifiers. The conventional NB, SVM, linear LR with L1-regularization (LR\_L1), linear LR with L2-regularization (LR\_L2) were used for comparison to evaluate the performance of the LORSAL classifier. This paper also presents an investigation of critical frequency bands [47,48] and an analysis of the effect of extracted features for EEG-based emotion classification.

The rest of this paper is organized as follows. Section 2 presents the materials and methods, including the dataset for emotion analysis using EEG, physiological and video signals (DEAP), various features extracted from the EEG signals, the introduced LR model with Gaussian kernel and Laplacian prior, and the LORSAL algorithm to learn LR regressors. The experimental results are shown in Section 3. The introduced method is evaluated in the task of subject-dependent emotion recognition in valence and arousal dimensions, and the compared methods include NB, SVM, LR\_L1, and LR\_L2. Section 4 gives the discussion and a further comparison of LORSAL and the DL methods. Related conclusion and future work are presented in Section 5.

#### **2. Materials and Methods**

#### *2.1. DEAP Dataset and Pre-Processing*

This study was performed on the dataset DEAP developed by researchers at Queen Mary University of London [27]. This dataset is publicly available (http://www.eecs.qmul.ac.uk/mmv/ datasets/deap/index.html) and consists of multimodal physiological signals for human emotion analysis. It contains, in total, 32 EEG-channel recordings and eight peripheral signals of 32 subjects (50 percent females, aged between 19 and 37). The carefully selected 40 1-min videos were used as emotion elicitation materials [27]. As shown in Figure 1, the 2D valence-arousal emotion model by Russell [15] was used to quantitatively describe emotional states. The first dimension, valence, ranges from unpleasant to pleasant, and the second dimension, arousal, changes from bored to excited. Therefore, the valence-arousal model can describe most variations in human emotion changes. The well-known self-assessment manikins (SAM) [49] (shown in Figure 2) were adopted for self-assessment along the valence and arousal dimensions, and the corresponding discrete rating values change from 1 to 9, which can be used as identification labels in emotion analysis tasks [27]. In this paper, the first 32-channel EEG records (marker in Figure 3) in the DEAP dataset preprocessed in MATLAB format were used. The EEG signals were preprocessed by down-sampling from 512 Hz to 128 Hz, and then band-pass filtering with 4–45 Hz. –

**Figure 1.** 2D valence-arousal emotion model by Russell.

**Figure 2.** Images used for self-assessment manikins (SAM): (**a**) Valence SAM, (**b**) arousal SAM.

**Figure 3.** International 10-20 system for 32 electrodes (marked with blue circles).

(LA/HA). The subjects' rating (LA/HA). The subjects' rating In this work, two different binary classification problems were posed for subject-dependent emotion recognition: The discrimination of low/high valence (LV/HV), and low/high arousal (LA/HA). The subjects' ratings (scaling from 1 to 9) by SAM in the experiments [27] were used as the ground truth and the threshold was selected as 5 to divide the rating values into two categories: LV/HV and LA/HA. The time duration of one trail for each subject in the preprocessed EEG sequences is 63 s, in which the first 3 s are baseline signals before watching video elicitations. The 3 s sequences were removed to obtain the stimulus-related dynamics. The remaining 60-s EEG signals (thus, 7680 readings in each EEG channel, in total) were segmented into sixty 1 s epochs. Thus, there were 40 \* 60 EEG epochs, in total, for each participant. Each subject-dependent EEG data had a dimensionality of 128 (sampling points) \* 32 (EEG channels) \* 2400 (EEG epochs). Finally, we obtained the labeled EEG signals with the dimension of 2400 for each subject. In this paper, for each subject, 10% of labeled epochs were used to train the emotion classifier, and the remaining 90% for test. For example, the constructed EEG dataset for the first participant consisted of 960 LV and 1440 HV epochs. Then, 10% of samples were randomly selected from LV and HV samples, respectively, and 240 epochs were selected for training. Ten-fold cross-validation was used to evaluate the introduced LORSAL classifier, and the compared traditional methods.

#### *2.2. Feature Extraction*

In this study, various power spectral features in the frequency domain and features calculated by combinations of electrodes were extracted from the constructed EEG signals. The extraction of prominent statistical characteristics is important for emotion recognition. The physiological signals, e.g., EEG, are characterized with high complexity and non-stationarity, and power spectral density (PSD) [20–22] from different frequency bands is the most well-known applicable statistical feature in the task of emotion analysis. This benefits from the assumption that EEG signals are stationary for the duration of a trail [50]. Many studies in neuroscience and psychology [51] suggest that these five frequency bands are closely linked to psychological activities, including the emotion activity: Delta (1 Hz–3 Hz), Theta (4 Hz–7 Hz), Alpha (8 Hz–13 Hz), Beta (14 Hz–30 Hz), and Gamma (31–50 Hz). The fast Fourier transform (FFT) can be applied using discrete Fourier transform (DFT) [52], while the common alternatives are short-time Fourier transform (STFT) [53,54]. PSD features are extracted from the above five frequency bands using 256-point STFT and a sliding 0.5 s Hanning window with 0.25 s overlapping along 1 s epoch for each EEG channel.

Differential entropy (DE) [55,56] is a measurement of the complexity of a continuous random variable by extending the Shannon entropy concept [57]. These studies by Zheng et al. [47,48] and Duan et al. [56] introduced DE for emotion classification using EEG low/high-frequency patterns.

The original formula of DE is defined as

$$h(\mathbf{X}) = -\int\_{X} f(\mathbf{x}) \log(f(\mathbf{x})) d\mathbf{x},\tag{1}$$

and DE when a random variable *X* obeys the Gaussian distribution *N*(µ, σ 2 ) can be simply given as:

$$h(X) = -\int\_{-\infty}^{+\infty} \frac{1}{\sqrt{2\pi\sigma^2}} \exp\frac{(x-\mu)^2}{2\sigma^2} \log\frac{1}{\sqrt{2\pi\sigma^2}} \exp\frac{(x-\mu)^2}{2\sigma^2} d\mathbf{x} = \frac{1}{2} \log 2\pi a \sigma^2,\tag{2}$$

where π and *e* are constants. According to [55], given a certain frequency band, DE equals to the logarithmic spectral energy for a fixed-length EEG recording. Thus, the DE features are calculated in the five frequency bands as for the PSD features.

In the literature [58,59], the asymmetric brain activity between the left and right hemispheres is of high relation with emotions. In the studies [47,48], the differential asymmetry (DASM) and rational asymmetry (RASM) features were defined as the differences and ratios of the DE features of hemisphere asymmetric electrodes. Here, 14 pairs of asymmetric electrodes are selected to calculate DASM and RASM: Fp1-Fp2, F7-F8, F3-F4, T7-T8, P7-P8, C3-C4, P3-P4, O1-O2, AF3-AF4, FC5-FC6, FC1-FC2, CP5-CP6, CP1-CP2, and PO3-PO4. The DASM and RASM features are given as

$$\text{DASM} = \text{DE}(X\_{left}) - \text{DE}(X\_{right})\_\prime \tag{3}$$

and

$$\text{RASM} = \text{DE}(X\_{left}) / \text{DE}(X\_{right}) \tag{4}$$

respectively. Due to the studies suggested in [58,59], the emotional states are closely linked to the spectral differences of brain activity between frontal and posterior brain regions. The definition of differential caudality (DCAU) features [47,48] was also adopted in this paper to characterize the spectral asymmetry in frontal-posterior direction. The DCAU features are given as the differences between 11 pairs of frontal-posterior electrodes: FC5-CP5, FC1-CP1, FC2-CP2, FC6-CP6, F7-P7, F3-P3, Fz-Pz, F4-P4, F8-P8, Fp1-O1, Fp2-O2. The formulation of DCAU is defined as

$$\text{DCAU} = \text{DE}(X\_{fround}) - \text{DE}(X\_{posterior}).\tag{5}$$

The dimensions of the PSD, DE, DSAM, RASM, and DCAU features are 160 (32 channels \* 5 bands), 160 (32 channels \* 5 bands), 70 (14 pairs of electrodes \* 5 bands), 70 (14 pairs of electrodes \* 5 bands), 55 (11 pairs of electrodes \* 5 bands), respectively. For simplicity, the above-extracted features were used directly and separately as input for the introduced and compared recognition methods.

#### *2.3. Logistic Regression with Gaussian Kernel and Laplacian Prior*

Logistic regression (LR) has been a common statistical learning model in pattern recognition and machine learning [38]. Strictly speaking, the applications of LR in EEG signal analysis is not new, as illustrated in the above introduction [39–43]. Despite this, the potential of the LR model for EEG-based emotion recognition has not been fully exploited. In this paper, we systematically introduced the logistic regression (LR) algorithm with Gaussian kernel and Laplacian prior [44–46] for emotion recognition with EEG signals.

The goal of a supervised learning algorithm is to train a classifier using training samples in order to recognize the label of an input feature vector from different classes. In EEG-based emotion recognition, the major task is to assign the input EEG signals to one of the given classes. Especially in this study, two binary classification problems were posed for subject-dependent emotion recognition: The classification of LV/HV emotions, and LA/HA emotions.

Using a multinomial LR (MLR) model [38,44], the probability that the input feature **x***<sup>i</sup>* belongs to emotion class *k* is written as

$$p(y\_i = k | \mathbf{x}\_i, \mathbf{w}) = \frac{\exp(\mathbf{w}^{(k)} \mathbf{h}(\mathbf{x}\_i))}{\sum\_{k=1}^{K} \exp(\mathbf{w}^{(k)} \mathbf{h}(\mathbf{x}\_i))} \,' \tag{6}$$

where **x***<sup>i</sup>* is the feature vector extracted from the original EEG sequences, and **h**(**x***<sup>i</sup>* ) indicates a vector of functions of the input feature vector **x***<sup>i</sup>* , and *<sup>w</sup>* <sup>≡</sup> [*w*(1) T , . . . , *w*(*K*) T ] T is the logistic regressors. For binary classification tasks (*K* = 2), this is known as LR model, for *K* > 2, the usual designation is MLR [44]. Although emotion recognition in this paper is binary classification, the formula of MLR is presented here for completeness. On one hand, this does not affect the understanding of the model, on the other hand, this makes it easy to extend to the cases when handling multiple emotion classes.

Note that the function **h**(**x***<sup>i</sup>* ) can be linear or nonlinear. For the latter case, kernel functions can be selected to further enhance the separability of extracted features in the transformed space. In this study, the Gaussian kernel is utilized, given by

$$K(\mathbf{x}\_{i\prime}, \mathbf{x}\_{j}) = \exp(-\|\,\mathbf{x}\_{i} - \mathbf{x}\_{j}\|\,/(2\rho^{2})),\tag{7}$$

In this paper, the training of the LR classifier using labeled EEG epochs amounts to estimate the class densities and learn the logistic regressor *w*. Following the formulation of the sparse MLR (SMLR) algorithm in [44], the solution of *w* is given by the maximum a posteriori (MAP) estimate

$$
\hat{w} = \arg\max\_{\mathcal{w}} \ell(\mathcal{w}) + \log p(\mathcal{w}),
\tag{8}
$$

where ℓ(*w*) indicates the log-likelihood function given as following:

$$\ell(\boldsymbol{w}) = \log \prod\_{i=1}^{L} p(y\_i | \mathbf{x}\_i, \mathbf{w}) \, , \tag{9}$$

where *L* denotes the number of training samples, and

$$p(\mathfrak{w}) \propto \exp(-\lambda \parallel \mathfrak{w} \parallel\_1),\tag{10}$$

denotes the Laplacian prior, where <sup>k</sup> *<sup>w</sup>* <sup>k</sup><sup>1</sup> indicates the L1 norm of *<sup>w</sup>*, and l is the regularization parameter. The Laplacian prior forces the sparsity on the logistic regressors *w*, and promote many components of *w* equal to zero [45,46]. The obtained sparse regressor reduces the complexity of the LR classifier and, therefore, avoids over-specification in EEG-based emotion classification.

The convex problem in Equation (8) is difficult to optimize for the nonquadratic property of the term ℓ(*w*) and the non-smoothness of the term log*p*(*w*). The studies in [44,60] decomposed the problem in Equation (8) into a sequence of quadratic problems using a majorization-minimization scheme [61]. The SMLR algorithm optimizes each quadratic problem with the complexity of *O*(((*L* + 1)*K*) 3 ) [44]. The fast SMLR (FSMLR) [62] is more efficient by applying a block-based Gauss–Seidel iterative procedure to estimate *w*. Thus, the FSMLR algorithm is *K* 2 faster than SMLR with the complexity of *O*((*L* + 1) <sup>3</sup>*K*).

In this work, the logistic regression via variable splitting and augmented Lagrangian (LORSAL) [45] algorithm is introduced to solve the LR regressors in Equation (8). LORSAL has been proposed for hyperspectral image classification (HSI) in remote sensing community [45,46]. The complexity of LORSAL is *O*((*L* + 1) <sup>2</sup>*K*) for each quadratic problem, compared to the *O*(((*L* + 1)*K*) 3 ) and *O*((*L* + 1) <sup>3</sup>*K*) complexities of the SMLR and FSMLR algorithms. Note that in this paper, we might use LORSAL directly to indicate the introduced LR with Gaussian kernel and Laplacian prior.

#### **3. Experimental Results**

In this work, we systematically investigated the classification performance of the introduced LORSAL method compared with four classifiers, Naive Bayes (NB) [27], support vector machine (SVM) [25,26], linear LR with L1-regularization (LR\_L1), linear LR with L2-regularization (LR\_L2) for the binary classification of the LV/HV and LA/HA emotional states. These features, including PSD, DE, DASM, RASM, and DCAU, were extracted from the EEG-signals and used directly as inputs for the classifier. The NB in MATLAB was employed as in [27]. The LIBLINEAR [63] software was adopted for the implementation of the LR\_L1, and LR\_L2 classifiers, respectively, with the default cost parameter. The LIBSVM [64] tool was utilized to implement the SVM classifier by using the linear kernel with default parameters. For simplicity, the parameters for Gaussian kernel and Laplacian prior in the LORSAL method were set as default in [46]. Such parameter settings may be not optimal for EEG-based emotion recognition, but present ideal classification performance in the experiments.

#### *3.1. Overall Classification Accuracy*

The mean accuracies and standard deviations obtained by different classifiers in valence dimension for different features extracted from five frequency bands (Delta, Theta, Alpha, Beta, and Gamma) and the total frequency bands are tabulated in Table 1. It should be noted that 'Total' in Table 1 denotes the features by concatenating all different features from all frequency bands. Given the same features extracted from the EEG signals, the accuracy metrics of the black bold font in Table 1 indicate the highest accuracies obtained by different classifiers for each frequency band, while the precision metrics with gray background denote the highest precisions obtained by compared methods for all kinds of frequency bands. The LORSAL methods obtained the highest accuracy of 77.17% for the DE feature from the total frequency bands among all the compared classifiers. Under the same case, the highest classification accuracy obtained by SVM is 69.55%, while the best accuracy by NB is 62.36% for the DASM feature from the total frequency bands.


**Table 1.** The mean precisions and standard deviations (%) of the classification of LV/HV emotions obtained by the compared classifiers for different features extracted from different frequency bands.

The SVM classifier is the most widely applied approach based on spectral features, especially PSD. Table 1 shows that the performance of SVM is second only to LORSAL on all features extracted from total frequency bands, and the corresponding accuracies are 69.04%, 69.55%, 64.48%, 48.17%, and 63.48% for the PSD, DE, DASM, RASM, and DCAU features. In study [27], the NB classifier obtained an accuracy of 57.6% in the valence dimension. In this study, the mean precisions obtained by NB are approximately between 60% and 62% for the PSD, DE, DASM, and DCAU features from the total frequency bands.

However, the best accuracies obtained by LR\_L1 and LR\_L2 are approximately 46%, which is significantly lower than those obtained by NB, SVM, and LORSAL. Although the LR\_L1 and LR\_L2 adopted L1-regularization and L2-regularization during the optimization of LR regressors to avoid over-specification, the assumption of linear separability does not hold for the extracted features from EEG signals. The average precisions obtained by the LORSAL have significant improvement over these by LR\_L1 after incorporating the Gaussian kernel. The Gaussian kernel can enhance the data separability in the transformed space and meanwhile, the Laplacian prior can promote sparsity on the learned LR regressor and avoid over-specification of the selected training EEG epochs.

For the classification task of LV/HV emotions, the LORSAL methods present the best classification accuracies, 77.17%, 71.63%, and 69.89% on the DE, DASM, and DCAU features from total frequency bands, which are higher than SVM by about 8%, 7%, and 6%. The SVM and NB classifiers perform best by accuracies 69.04% and 55.65% on the PSD and RASM features from 'Total' bands, respectively. As is shown in Table 2, the performance of the five classifiers on classifying LA/HA emotions is similar to the case of LV/HV classification. The introduced LORSAL method performed best on the DE, DASM, and DCAU features from 'Total' bands, and the corresponding accuracies outperformed these of SVM by about 7%, 9%, and 8%. In addition, the performance of LORSAL was obviously better than that

of the compared LR\_L1, and LR\_L2 methods in the arousal dimension. The incorporated Gaussian kernel and Laplacian prior improved the distinguishing ability of LORSAL in emotion recognition task based on EEG signals.


**Table 2.** The mean precisions and standard deviations (%) of the classification of LA/HA emotions obtained by the compared classifiers for different features extracted from different frequency bands.

For a more comprehensive comparison of the NB, SVM, and LORSAL approaches, Table 3 tabulated the average values and standard deviations of precision, recall, and F1, for the binary emotion classification problem of LV/HV and LA/HA, respectively, when different features were extracted from the total frequency bands. The introduced LORSAL method obtained the best precisions (77.17%/6.37% for LV/HV, and 77.03%/6.20% for LA/HA), the best recalls (76.79%/6.21% for LV/HV, and 76.15%/6.14% for LA/HA), and the best F1 values (76.90%/6.27% for LV/HV, and 76.47%/6.14% for LA/HA), for EEG-based emotion recognition. In summary, the above analysis suggests the application of LORSAL on the DE features extracted from 'Total' bands for EEG-based emotion recognition. For simplicity, we will focus on comparing the performance of LORSAL with NB and SVM in the following subsections.


**Table 3.** The mean metrics and standard deviations (%) of precision, recall, and F1 for the binary classification of LV/HV and LA/HA emotions obtained by the compared classifiers for different features extracted the total frequency bands.

#### *3.2. Investigation of Critical Frequency Bands*

In this study, the informative features were extracted from different frequency bands (Delta, Theta, Alpha, Beta, Gamma, and Total) for EEG-based emotion recognition. Thus, we present an investigation of the critical frequency bands in EEG signals for emotion processing. Figures 4a–f and 5a–f show the mean precisions obtained by LORSAL, SVM, and NB for the classification of LV/HV and LA/HA, respectively, when the frequency bands alternated among Delta, Theta, Alpha, Beta, Gamma, and Total. Gamma and Beta are more informative than other frequency bands (Delta, Theta, and Alpha). For example, among the first five frequency bands, the LOSAL obtained the highest accuracies of 72.93% and 67.06% on Gamma and Beta from the DE features in valence classification, the best accuracies of 72.73% and 66.57% on Gamma and Beta from the DE features in arousal recognition.

There is not always a causal relationship between features with high recognition accuracies and emotions. Koelstra et al. [27] presented an investigation about the causal relationship between the emotions and their EEG signals on the DEAP dataset. The average frequency power of trials was calculated over the bands Theta, Alpha, Beta, and Gamma (between 3 and 47 Hz). The Spearman correlated coefficients were tabulated [27] to present the statistical correlation of the power changes of EEG sequences and subject ratings. Similar research has been done by Zheng [47], we focused on analyzing the informative neural patterns associated with recognizing different emotions. Especially, the Fisher ratio was used to investigate the critical frequency bands for discriminating different emotions. The Fisher ratio has been used in pattern recognition for class separability measure and feature selection [65–67], as well as in emotion classification [3,4,13,14,27]. The higher values of the Fisher ratio mean more informative neural patterns and features related to emotion recognition. It is defined as the ratio of interclass difference to the intraclass spread

$$F\_{\rm n}(\text{L}, \text{H}) = \frac{\left(\text{m}\_{Ln} - \text{m}\_{Hn}\right)^2}{\sigma\_{Ln}^2 + \sigma\_{Hn}^2} \text{ ,} \tag{11}$$

where L and H mean two different emotion, e.g., LV/HV or LA/HA, and m*Ln*, m*Hn*, σ 2 *Ln*, and σ 2 *Hn* denote the mean and variance of the *n*-th dimension of the EEG feature belonging to emotions L and H, respectively. Thus, *Fn*(L, H) indicates the class separability between emotions L and H for the *n*-th dimension of the extracted feature.

**Figure 4.** Effects of the different frequency bands (Delta, Theta, Alpha, Beta, Gamma, and Total) on the classification precisions for LV/HV emotions obtained by the LORSAL, SVM, and NB classifiers for the five different features: (**a**) PSD, (**b**) DE, (**c**) DASM, (**d**) RASM, (**e**) DCAU.

**Figure 5.** Effects of the different frequency bands (Delta, Theta, Alpha, Beta, Gamma, and Total) on the classification precisions for LA/HA emotions obtained by the LORSAL, SVM, and NB classifiers for the five different features: (**a**) PSD, (**b**) DE, (**c**) DASM, (**d**) RASM, (**e**) DCAU.

Given the extracted feature and the specific frequency bands, the mean Fisher ratio *F*(L, H) is calculated by averaging the values of *Fn*(L, H) over all the EEG channels (e.g., PSD and DE) or combinations of electrodes (e.g., DASM, RASM, and DCAU)

$$F(\mathbf{L}, \mathbf{H}) = \frac{1}{N} \sum\_{n=1}^{N} F\_n(\mathbf{L}, \mathbf{H}),\tag{12}$$

where *N* is the number of EEG channels or combinations of electrodes.

Figure 6 shows the Fisher ratio of the extracted PSD, DE, DASM, RASM, and DCAU feature along different frequency bands by averaging the values over all subjects in valence and arousal dimensions, respectively. In addition, Table 4 illustrates the Fisher ratio over different frequency bands by averaging the values over all features and subjects in valence and arousal dimensions, respectively. The Fisher ratio values shown in Table 4 are calculated by further averaging the values presented in Figure 6 over the five different frequency bands. The following subsection presents comprehensively an analysis including the EEG neural patterns associated with emotions in previous studies [27,47,48], the critical frequency bands and the informative features for emotion recognition. Specific frequency ranges are highly related to certain brain activities. There are neuroscience findings [68,69] revealing that the Alpha bands in EEG signals associate with attentional processing, while the Beta bands associate with emotional and cognitive progress. In the previous studies on the DEAP dataset by Koelstra et al. [27], they reported negative correlations in the Theta, Alpha, and Gamma bands for arousal, and strong correlations in all investigated frequency bands for valence. Similarly, Onton et al. [70] found a positive correlation of valence and the Beta and Gamma bands.

From Figure 6 and Table 4, we found that the Gamma and Beta bands obtained higher values of the Fisher ratio than the other frequency bands. This means that the features extracted over Gamma and Beta bands are more effective for discriminating different emotions, as is in accordance with the classification accuracies by the compared classifiers illustrated in Figure 5. Similarly, Li and Lu [71] showed that the EEG Gamma bands are appropriate for emotion recognition when images are used for emotion elicitation. The studies by [47] Zheng and Lu found specific neural patterns in high-frequency bands for distinguishing negative, neutral, and positive emotions. For negative and neutral emotions, the energy of Beta and Gamma frequency bands decreases, while positive emotions present higher energy of these two frequency bands. Their experimental results [47] on SJTU Emotion EEG Dataset (SEED) showed that the KNN, LR\_L2, SVM, and DBN classifiers performed better on Gamma and Beta frequency bands than other bands for the PSD, DE, DASM, RASM, and DCAU features. This showed the informativeness of EEG Gamma and Beta bands for emotion recognition with film clips as stimuli in the SEED dataset. The emotion elicitation materials in the DEAP dataset are one-minute videos [27]. Our experimental results in DEAP and findings were in accordance with these previous studies [47,48]. Additionally, the total frequency bands concatenating all the original five frequency bands can further improve the emotion recognition performance, and the LORSAL obtained the highest mean accuracies in valence and arousal dimensions from the DE, DASM, and DCAU features, as is consistent with the results in studies [47,48].

**Table 4.** Fisher ratio of different frequency bands by averaging the values over all features and subjects in valence and arousal dimensions, respectively.


**Figure 6.** Fisher ratio of features extracted on different frequency bands by averaging the values over all subjects in **(a)** valence and **(b)** arousal dimensions, respectively.

#### *3.3. E*ff*ect of Extracted Features*

 **317 131 095 307**  This subsection presents an analysis of the effects of different features on the average accuracies for emotion recognition based on the EEG signal. When the extracted features alternate among PSD, DE, DASM, RASM, and DCAU, the mean precisions obtained by LORSAL, SVM, and NB for the classification of LV/HV and LA/HA are shown in Figures 7a–f and 8a–f. All the LORSAL, SVM, and NB classifiers performed best for the DE features. Among all classifiers, LORSAL obtained the highest accuracies of 77.17% and 77.03% in valence and arousal dimensions, respectively, for the DE features extracted from total frequency bands. In addition, the highest precisions obtained by SVM are 69.55% and 69.92%, respectively, for DE from total frequency bands. The DE features denote the complexity of continuous random variables [55–57]. The EEG signals are characterized by higher low-frequency energy over high-frequency energy, and consequently, DE can distinguish EEG sequences according to low- and high-frequency energy. These results agree with the findings in [47,48], and further show the superiority of the DE features in EEG-based emotion classification.

– –

–

**Figure 7.** Effects of the different features (PSD, DE, DASM, RASM, and DCAU) on the classification precisions for LV/HV emotions obtained by the LORSAL, SVM, and NB classifiers from the six different frequency bands: (**a**) Delta, (**b**) Theta, (**c**) Alpha, (**d**) Beta, (**e**) Gamma, (**f**) Total.

**Figure 8.** Effects of the different features (PSD, DE, DASM, RASM, and DCAU) on the classification precisions for LA/HA emotions obtained by the LORSAL, SVM, and NB classifiers from the six different frequency bands: (**a**) Delta, (**b**) Theta, (**c**) Alpha, (**d**) Beta, (**e**) Gamma, (**f**) Total.

Moreover, the DASM and DCAU features provide relatively ideal performance compared to the PSD and DE features. DASM and DCAU are asymmetric features and the former findings showed the effectiveness of asymmetrical brain activity along left-right and frontal-posterior directions in emotion analysis. It is noted that the dimensions of the DASM and DCAU features are 70 and 55, respectively, which are fewer than that of PSD and DE with 160-dimension features. This makes DASM and DCAU more competitive in computational complexity. The experimental results were also consistent with the findings by Zheng and Lu [47,48].

#### **4. Discussion**

Although the area of affective computing has developed a lot over the past years, the topic of EEG-based emotion recognition is still a challenging problem. This paper introduced LR with Gaussian kernel and Laplacian prior for EEG-based emotion recognition. The Gaussian kernel enhances the EEG data separability in the transformed space, and the Laplacian prior controls the complexity of the learned LR regressor in the training process. The LORSAL algorithm was introduced to optimize the LR with Gaussian kernel and Laplacian prior for its low computational complexity. Various spectral power features in the frequency domain and features by combining the asymmetrical electrodes, PSD, DE, DASM, RASM, and DCAU, were extracted for the Delta, Theta, Alpha, Beta, Gamma and Total frequency bands using 256-point STFT from the segmented 1 s EEG epochs.

The experiments were conducted on the publicly available DEAP dataset, and the performance of the introduced LORSAL methods was compared with the NB, SVM, LR\_L1, and LR\_L2 classifiers. The experimental results showed that LORSAL presented the best accuracies of 77.17% and 77.03% in valence and arousal dimensions, respectively, on the DE features from the total frequency bands, while the SVM classifiers obtained the second-highest accuracies of 69.55% and 69.92%. The other evaluation metrics obtained by LORSAL, SVM, and NB were also tabulated in the paper. The introduced LORSAL method also presented the best Recall (76.79% and 77.03% in valence and arousal, respectively) and F1 (76.90% and 76.47% in valence and arousal, respectively). The previous experimental results showed the superiority of the introduced LORSAL method for EEG-based emotion recognition compared to the NB, SVM, LR\_L1, and LR\_L2 approaches.

This paper also showed an investigation of the critical frequency bands for EEG-based emotion recognition. In this study, the informative features are captured from different frequency bands: Delta, Theta, Alpha, Beta, Gamma, and Total. The previous neuroscience studies showed that specific frequency band ranges are associated with specific brain activities. For example, the EEG Alpha frequency bands are related to attentional processing, whereas the Beta bands are a reflection of emotional and cognitive processing. The experimental results showed that the LORSAL, SVM, and NB classifiers performed better on the Gamma and Beta frequency bands than other bands for different features. The comparison of the Fisher ratio also showed the effectiveness of Gamma and Beta bands in emotion recognition. The findings in this study are in accordance with the previous work about critical bands investigation [47,48].

Additionally, the effects of different features, PSD, DE, DASM, RASM, and DCAU, on the emotion classification results were also analyzed in this paper. Experimental results show that the compared approaches, LORSAL, SVM, and NB obtained superior precision metrics on the DE features over other features. This shows the effectiveness of the DE features in distinguishing low- and high-frequency energy in EEG sequences. Meanwhile, the DASM and DCAU features presented relatively ideal classification accuracies compared to the PSD features. It is noted that DASM and DCAU have the advantages of less time consumption for their lower dimensionality than PSD and DE.

For a more comprehensive analysis, -Table 5 showed a comparison of the introduced LORSAL methods, the other shallow classifiers, and the deep learning approaches for EEG-based emotion recognition of LV/HV and LA/HA on DEAP dataset. In single-trial classification by Koelstra et al. [27], the NB after feature selection using Fisher's linear discrimination, obtained the accuracies of 57.65% and 62.0% in valence and arousal dimensions. In [72], the Bayesian weighted-log-posterior function

optimized with the perceptron convergence algorithm presented average precisions of 70.9% and 70.1% for valence and arousal. For within-subject emotion recognition of LV/HV and LA/HA, Atkinson et al. [73] presented the accuracies of 73.41% and 73.06% using minimum-Redundancy-Maximum-Relevance (mRMR) for feature selection. Rozgi´c et al. [74] performed classification using segment level decision fusion and presented precisions of 76.9% and 69.4% to discriminate LV/HV and LA/HA emotions. In the studies by Zheng et al. [48], the discriminative graph regularized extreme learning machine (GELM) with DE features achieved the highest average accuracies of 69.67% for 4-class classification in VA emotion space. The introduced LORSAL classifier presented ideal evaluation metrics for EEG emotion recognition, including the compared NB, SVM, LR\_L1, and LR\_L2 methods in the experiments.

Recently, deep learning (DL) methods have been used for EEG-based emotion classification [28,29]. In [75], a hybrid DL model combining CNN and RNN learned task-related features from grid-like EEG frames and achieved the accuracies of 72.06% and 74.12 for valence and arousal. The DNN and CNN models by Tripathi et al. [76] achieved the precisions of 75.78%, 73.12%, 81.41%, and 73.36% along valence and arousal dimensions, respectively. The classification accuracies for valence and arousal were over 85% using LSTM-RNN by Alhagry et al. [77], and over 87% using 3D-CNN by Salama et al. [78]. More recently, Chen et al. [33,34] have researched a lot on the combination of DL models and various features. As tabulated in Table 5, computer vision CNN (CVCNN), global spatial filter CNN (GSCNN), and global space local time filter (GSLTCNN) [33] presented obvious improvements with concatenating PSD, raw EEG features, and normalized EEG signals. In [34], the proposed hierarchical bidirectional gated recurrent unit (H-ATT-BGRU) network performed better on raw EEG signals than CNN and LSTM, and the obtained accuracies in valence and arousal dimensions were 67.9% and 66.5% for 2-class cross-subject emotion recognition. For more details about the DL architectures applied in the DEAP data, readers may refer to the literature [33,34,75–78]. Compared to traditional shallow methods, the DL schemes remove the signal pre-processing and feature extraction/selection progress, and are more suitable for affective representation [35,36]. However, the DL methods cannot reveal the relationship between emotional states and EEG signals for being like a black box [37].

However, more importantly, the training of DL networks is extremely time-consuming, which limits their practical applications in real-time emotion recognition [3]. Craik et al. [28] stated that from practical issues, the DL methods have problems of very long computation, and the vanishing/ exploding gradients, and their practical application need extra graphic processing unit (GPU). Roy et al. [29] pointed out that from a practical point-of-view, the hyperparameter search of a DL algorithm often takes up a lot of time for training. Additionally, Craik et al. [28] and Roy et al. [29] make comprehensive reviews on the recent DL schemes.

To illustrate the time efficiency, the average training time of the compared NB, SVM, MLR\_L1, MLR\_L2, and LORSAL methods are shown in Table 6. The average running time for STFT-based feature extraction is 68.15 s. In our experiment, all the programs are performed using on a computer with an Intel Core i5-4590 of 3.30 GHz and 8.00-GB RAM. LORSAL takes just no more than 4 s for training, and the computing time is in the same order as the compared traditional shallow methods. As mentioned earlier, the complexity of LORSAL is *O*((*L* + 1) <sup>2</sup>*K*) for each quadratic problem, where *L* is the number of EEG epochs used for training and *K* is the number of emotion classes. As shown in Table 5, the time-consumptions of LORSAL on DE, PSD, DASM, RASM, and DCAU (with different dimensions 160, 160, 70, 70, and 55) are nearly the same. Given limited computational resources, or with portable devices, the introduced LORSAL algorithm has higher time efficiency than DL methods and can present better performance than the compared shallow methods.

**Table 5.** Comparison of the introduced LORSAL methods, the other shallow classifiers, and the deep learning approaches for EEG-based emotion recognition of LV/HV and LA/HA on DEAP dataset.



**Table 6.** Running time of the compared NB, SVM, MLR\_L1, MLR\_L2, and LORSAL methods in terms of second and the average time-consumption of feature extraction is 68.15 s.

#### **5. Conclusions and Future Work**

This paper systematically investigates the introduced LORSAL algorithm for the EEG-based emotion class. Additionally, the critical frequency bands, Delta, Theta, Alpha, Beta, Gamma, and the effectiveness of different features, PSD, DE, DASM, RASM, and DCAU, on emotion recognition are also analyzed. The LORSAL classifier performs better than the compared shallow methods and has the superiority of time efficiency compared to the recent DL approaches.

The performance and application of LORSAL-based emotion recognition should be further researched in future work. More informative and representative features can be used in LORSAL. As shown in Table 5, in the research by Chen et al. [33], SVM achieved higher values of AUC (Area Under ROC Curve), 0.9234 and 0.9426, for classifying LV/HV and LA/HA emotions by concatenating PSD and raw pre-processed EEG signals than with other features. We will try to integrate different features to train the LORSAL classifier. The future attempts include the application of LORSAL for 4-class emotion classification in VA space, as the studies in [48]. A further comparison of LORSAL and DL methods, and the combination of their advantages in feature extraction and avoiding overfitting will be investigated. Future work could also include applying LORSAL on multimodal information, e.g., fNIRS, and other physiological signals in brain activity analysis [79,80].

**Author Contributions:** Conceptualization, C.P. and C.S.; methodology, C.P. and C.S.; software, C.P. and C.S; validation, C.P.; formal analysis, C.P. and C.S; investigation, C.P.; writing—original draft preparation, C.P.; writing—review and editing, C.S.; supervision, H.M., J.L. and X.G.; funding acquisition, C.P., C.S., and H.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the National Natural Science Foundation of China under Grant 61902313, the Fundamental Research Funds for the Central Universities, Xidian University, No. RW190110, and the Construction Project Achievement of College Counselor Studio of Shaanxi Province: Reach Perfection with Morality Studio.

**Acknowledgments:** The authors would like to thank J. Li for providing the source codes of the LORSAL algorithm on the websites (http://www.lx.it.pt/~{}jun/).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Deep Learning for EEG-Based Preference Classification in Neuromarketing**

#### **Mashael Aldayel 1,2,\* ,† , Mourad Ykhlef 1,† and Abeer Al-Nafjan 3,†**


Received: 14 January 2020; Accepted: 19 February 2020; Published: 24 February 2020

#### **Featured Application: This article presents an application of deep learning in preference detection performed using EEG-based BCI.**

**Abstract:** The traditional marketing methodologies (e.g., television commercials and newspaper advertisements) may be unsuccessful at selling products because they do not robustly stimulate the consumers to purchase a particular product. Such conventional marketing methods attempt to determine the attitude of the consumers toward a product, which may not represent the real behavior at the point of purchase. It is likely that the marketers misunderstand the consumer behavior because the predicted attitude does not always reflect the real purchasing behaviors of the consumers. This research study was aimed at bridging the gap between traditional market research, which relies on explicit consumer responses, and neuromarketing research, which reflects the implicit consumer responses. The EEG-based preference recognition in neuromarketing was extensively reviewed. Another gap in neuromarketing research is the lack of extensive data-mining approaches for the prediction and classification of the consumer preferences. Therefore, in this work, a deep-learning approach is adopted to detect the consumer preferences by using EEG signals from the DEAP dataset by considering the power spectral density and valence features. The results demonstrated that, although the proposed deep-learning exhibits a higher accuracy, recall, and precision compared with the k-nearest neighbor and support vector machine algorithms, random forest reaches similar results to deep learning on the same dataset.

**Keywords:** neuromarketing; brain computer interface (BCI); consumer preferences; EEG signal; deep learning; deep neural network (DNN)

#### **1. Introduction**

Neuromarketing is an emerging field that links the cognitive and affective sides of the consumer behavior by using neuroscience. Since its origin in 2002, this field has rapidly achieved credibility among the advertising and marketing specialists, and many such specialists are adopting neuromarketing strategies [1].

Neuromarketing can assist marketers in understanding how a consumer's brain evaluates the diverse brands and recognizing the factors that affect the consumers' choices when purchasing products. Neuromarketing research has demonstrated that people do not always recognize what happens in their unconscious brains. Furthermore, it has been demonstrated that people are not always explicit in their preferences or intentions [2].

The use of traditional marketing tools, such as interviews and questionnaires, to assess consumer preferences, needs, and buying intentions can lead to the generation of biased or incorrect conclusions [3,4]. Similarly, an oral expression of preferences can produce conscious or unconscious biases. It is difficult to extract the consumer preferences directly through choices, owing to the high product costs, ethical caution considerations, or the product not having been invented at the time of evaluation [3]. These elements highlight a contradiction in the users' opinions during the usability assessments and their actual opinions, feelings, and senses regarding the use of a product [4].

Therefore, neuromarketing requires more effective methodological alternatives to evaluate the consumer behavior. Novel neuroimaging procedures provide an effective approach to study consumer behavior. Such methods ultimately help marketers examine the consumers' brains to obtain valuable insights into the subconscious procedures underlying successful or failed marketing messages. This information is obtained by eliminating the primary problem in traditional advertising research, that is, trusting people; in particular, people should be trusted, whether they are consumers or workers who report on how the consumers are influenced by a specific part of an advertisement [1].

Brain–computer interfaces (BCIs) are promising neuroimaging tools in neuromarketing. This technology allows the users to communicate effectively with computer systems. A BCI does not require the use of any external devices or muscle interference to produce commands [5]. Furthermore, a BCI employs voluntarily generated user brain activity to control a system through signals, which provides the ability to communicate or interact with the nearby environment. Electroencephalography (EEG) is one of the main instruments used to examine brain activity. The EEG technique is the only practical, versatile, affordable, portable, non-invasive BCI to perform repetitive, real-time analysis of brain interactions in high temporal resolution [5–7].

Therefore, in the present research study, EEG was adopted as the input brain signal for a BCI system. Using classification algorithms, BCIs can be used as neural measures to distinguish the preference patterns from brain signals and translate them into actions to promote a product. In addition, the performance of a deep neural network (DNN) was implemented and examined to model a benchmark dataset for the preference classification.

The main objective of this research was to deeply investigate EEG-based preference recognition in neuromarketing to enhance the accuracy of classification prediction by comparing the performance of deep-learning with other conventional classification algorithms, such as support vector machine (SVM), random forest (RF), and k-nearest neighbors (KNN).

#### **2. Background**

This section provides a review of the main concepts used in this research: neuromarketing, BCI, and EEG.

#### *2.1. Neuromarketing*

Traditional marketing approaches include surveys, interviews, questionnaires, and focus groups, in which consumers openly and consciously report their experiences and opinions. However, such traditional approaches cannot evaluate the unconscious side of the consumer behavior. Neuroscience has the potential to discern the unconscious motivations that influence the act of making choices. It has been reported that approximately 90% of data are processed subconsciously in the human mind [8]. In the field of neuromarketing and consumer neuroscience, the evaluation of the subconscious activities exposes the true preferences of consumers more accurately than traditional marketing research does. Furthermore, neuromarketing can reveal information regarding the consumer preferences/ratings that cannot be accurately determined through the traditional methods. This is because subconscious opinions play a key role in consumer decision-making. Traditional market research approaches fail to assess the subconscious activities in the consumer brain, which leads to an inequality between

the results of the traditional market research and the real behavior of the consumers at the points of purchase [8].

The term "neuromarketing" is derived by combining the prefix "neuro" and the term 'marketing", indicating the integration of two study areas: neuroscience and marketing [1]. Neuroscience is a field that examines the facets of the brain at the biological level and from a psychological perspective [2]. In addition, neuroscience has significantly enlightened the field of marketing, and the interaction between these fields assists in intuiting the consumer behavior [8].

The term "neuromarketing" began to emerge organically around 2002. At that time, a few corporations, such as Brighthouse and SalesBrain, began offering neuromarketing studies and consultations, motivating the application of technology and knowledge from cognitive neuroscience to the field of marketing. Neuromarketing values the study of the consumer behavior from a psychological perspective [1]. Recently, several high-profile companies have begun exploiting neuromarketing approaches to assess the advertisements before introducing products to consumers [9]. This neuromarketing approach has gradually gained favor with brand executives in major corporations, such as Coca-Cola and Campbell's [10].

Neuromarketing researchers aim to use neuroscientific procedures to exploit consumer behavior (i.e., requests, needs, and preferences) when shoppers purchase goods. This factor represents the researchers' primary motivation for examining the consumers' sensorimotor, mental, and effective feedback for products and advertisements through various modalities [9]. There are several neuromarketing modalities besides BCI, such as eye-tracking, galvanic skin response, skin conductance, facial coding, and facial electromyography. Each modality records different neural measure [11]. Eye tracking is used to determine eye locations and eye movement to grasp the consumer's attention and natural responses to marketing stimuli. Galvanic skin response measures the moisture activity, which is related to the emotional state. Electromyography is used to evaluate the physiological features of facial muscles. Facial coding measures emotional states through facial expressions.

#### *2.2. BCIs*

BCIs are some of the most promising neuroimaging technologies in the neuromarketing domain. This technology helps facilitate effective communication between the users and computer systems. BCIs do not require any nerves, muscles, or movement interferences to issue a command [5] and employ the voluntarily-generated user brain activity to control a system through signals to communicate or interact with the nearby environment. Such environments can include wheelchairs, artificial arms/hands, and entertainment applications that involve skillful visualization, digital painting, and game playing [6].

BCI systems have contributed to numerous fields, including manufacturing, education, marketing, smart transportation, biomedical engineering, clinical neurology, and neuroscience [5,12]. A BCI system includes an input (i.e., the user's mental activity), output (i.e., states or commands), a decoder component between the input and output, and a protocol that regulates the beginning, offset, and timing of the action [6]. The BCI research is expected to lead to an approach in which the brain signals are operated to aid people in interaction actions [6].

In a BCI, the brain signals require processing in non-clinical situations, which corresponds to a new challenge in computational neuroscience research. Currently, most of the application-oriented BCI research is focused on endowing users—not only disabled people—the ability to control systems or sensors with various environments [6].

Different neuroimaging techniques can be recorded with non-invasive BCI, such as EEG, fNIRS, fMRI, PET, MEG, SST, and TMS. EEG has better temporal resolution than fNIRS, which is a relatively new neuroimaging technique in neuromarketing research. However, recent fNIRS research is still in the substantial validation phase [13,14]. EEG is most commonly used in neuromarketing research due to its advantages that are detailed in the next subsection.

#### *2.3. EEG*

The EEG is a widely used tool that examines the brain activity. The electrical activity is recorded on the scalp by evaluating the voltage variations from neurons firing in the brain. These electrical activities are logged over a period of time using several electrodes positioned on the scalp directly above the cortex. The electrodes are connected in a hat-like device [5,7]. The EEG has the following key benefits: it is non-invasive, portable, cost effective, and relatively simple to use, and it has an exceptional temporal resolution (up to milliseconds). However, the signal-to-noise ratio and spatial resolution are restricted compared with those of other techniques. Nevertheless, EEG is considered to be the only practical, non-invasive BCI input to realize a repetitive, real-time brain interactive analysis [5–7]. Therefore, EEG was selected as the input brain signal for the BCI in this research.

The international 10-20 system is a method used to name electrodes based on their location on the scalp. The approach relates information pertaining to the inter-electrode space, specifically, 10–20% of the front-to-back or right-to-left of the scalp boundaries. In other words, the distance between the nearby electrodes is either 10% or 20% of the scalp diameter, as depicted in Figure 1. The 10-20 standard has been frequently used across diverse EEG systems to increase the dependability of the signals and decrease the signal-to-noise ratio [5,7].

**Figure 1.** The international 10-20 system.

EEG Signals

The brain produces abundant neural activity, which can be captured as EEG signals for the BCI. These neural activities consist of two types: (1) rhythms; and (2) transient activities. The EEG activity can be further categorized on the basis of these types of activities [6,7].

#### 1. **Rhythms**:

Rhythms, neural oscillations, or brainwaves are repetitive forms of neural activity. The rhythms are measures of collective synaptic, neuronal, and axonal activities of the neuronal sets. The EEG activity is characterized by separating the frequencies into bands, denoted as delta, theta, alpha, beta, gamma, and mu rhythms. Table 1 presents the details of the EEG rhythms, ranges of frequency, amplitude, and shape, as well as the brain regions in which these activities are the most common along with the events usually associated with the type of band [6,7].

These frequency bands have been linked to affective reactions. The theta band in the front-center of the brain reflects the emotional processing when a consumer looks at a product. The alpha band on the prefrontal cortex differentiates between the positive and negative emotional valences. The beta band is correlated with the alterations during affective arousal. Finally, the gamma band is largely associated with the arousal effects [15].


**Table 1.** Categorization of the EEG rhythms based on their frequency.

#### 2. **Transient activities**

The transient activities or field potentials replicate the action potentials of certain neurons in a manner similar to spikes. These spikes can be recognized by their position, frequency, amplitude, shape, recurrence, and operational properties. The event-related potentials (ERPs) and event-related spectral perturbations (ERSPs) are common types of transient activities [3,16].

ERP is the most common spike and arises as a reaction to a specific event or stimulus. These spikes have extremely small amplitudes. Consequently, the EEG samples must be averaged over many iterations to uncover the ERPs and eliminate noise fluctuations [16]. Table 2 presents the common ERPs used in the neuromarketing research. ERSPs compute the reaction to a stimulus over a period of time and are similar to the ERPs. However, the ERSPs split the EEG signals into the diverse frequency bands to test whether a variation exists in the power of a specified frequency band over time [3].

**Table 2.** Common ERP components used in neuromarketing studies.


#### **3. Literature Review**

This section details EEG-based preference recognition, specifically, the neural correlation of the preference, predictive features of the preference, and preference classification algorithms.

Preference can be defined as a human attitude toward a collection of entities that can be mirrored in an explicit decision-making procedure. This aspect can also be an evaluative judgment in the sense of liking or disliking an object [19]. The possibility of measuring the conscious and unconscious brain activity in assessing advertisements, through processing the consumer's processing of the advertisement message, cognitive workload, and emotional state, cannot be disregarded. The idea of a 'buy button' in the brain may be overexaggerated; however, the research efforts to utilize the neural measures in monitoring the consumer thought processes are not trivial [20]. Understanding the neural process behind the preference, feelings, and decision-making can enhance the prediction of the user preferences and choices, and neuromarketing provides a precise objective determination of the implicit preferences of the consumers [21].

Several studies [10,22–24] have shown that the EEG can be used to determine the consumer preferences. To better utilize the EEG in consumer neuroscience research, the psychological processes underlying the consumer preferences must be understood.

In the following subsection, we describe the neural correlations of the EEG-based preference. Next, we classify the relevant studies into: (1) predictive features; and (2) classification algorithms of the preference recognition. Finally, we explain how the preferences can be detected using BCI.

#### *3.1. Neural Correlations of the Preference*

This subsection explains the neural elements correlated with the preferences. Certain areas of the brain are responsible for various cognitive and mental functions. To determine the positions of EEG electrodes, the underlying brain regions that are responsible for preference processing must be understood. Studies have demonstrated that the preference is linked to the frontal brain regions, specifically, the medial prefrontal cortex, nucleus accumbens [19,25], and medial orbitofrontal cortex [19,26].

Knutson et al. [25] linked the choice prediction of the products to the nucleus accumbens. When a consumer views the product, a higher activation of this region indicates a higher probability of the consumer purchasing that product. Furthermore, Kirk et al. [26] proved the relationship between the contextual preference and the medial orbitofrontal cortex; a higher activation in this region is related to higher level of preferences.

Recording the neural activity correlated with a certain function requires placing the electrodes directly above the corresponding brain area. Figure 2 shows the main electrode positions and the associated neural activity according to the 10-20 [27].

**Figure 2.** EEG electrodes and related neural functions.

Although many researchers [24–26] have proved that the medial-frontal cortex is responsible for the preference function, no consensus exists on which electrodes should be used within the same brain area. Table 3 summarizes the electrode positions used in the preference recognition research.

The authors of [24] proved that the medial-frontal cortex is linked to the individual preference in the beta range (16–18 Hz) on the mid-frontal areas on electrodes AFz, F2, FC1, and FCz. Moreover, the authors proved that the population preference is linked to the frontocentral areas in the theta range (60–100 Hz) on electrodes F1, F2, F4, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, C4, and CP5.

Agarwal et al. [27] linked preference attributes such as attention, emotions, and liking to electrodes F3, C3, P3, Pz, Fz, Cz, and C4.

Vecchiato et al. [28] found that asymmetrical increments of the theta and alpha bands are linked to watching pleasant (unpleasant) advertisements, as noted in the left (right) brain areas at electrodes Fp1 (Fp2), AF7 (AF8), F7 (F8), and F1 (F2). The spectral power for alpha bands increases noticeably for liked advertisements at electrode F1 and for the disliked advertisements at electrodes AF8 and AF4. In the theta band, increased activity occurs at electrodes F2, AF8, and F3 for the disliked advertisements and at Fp1 for the liked advertisements.

Touchette et al. [29] found that the frontal asymmetry in the alpha band is linked to the consumers' unconscious reactions to the product attractiveness at electrodes F3 and F4. Vecchiato et al. [28] found that the asymmetrical frontal activity is statistically significantly positive in the alpha and theta bands between F1 and F2. In addition, this activity is significantly negative in the theta band between Fp2 and Fp1, AF8 and AF7, and F8 and F7.


**Table 3.** Common rhythms/ERP and electrode positions used for the preference detection in neuromarketing.

#### EEG Indices

Based on our literature review, we identified four autonomic EEG indices that have been utilized for evaluating the reactions of the people in marketing stimuli: (1) the approach–withdrawal (AW) motivation index; (2) effort index; (3) choice index; and (4) valence. Such indices assist marketers in understanding customer responses to products [30,31].

#### 1. **AW Index**

The AW index is also known as the frontal alpha asymmetry, which indicates motivation, desire, or approach avoidance. The frontal asymmetry theory, which was initiated in 1985, states that the frontal regions of the left and right hemispheres are responsible for positive feelings (approach motivation) and negative feelings (withdrawal motivation) [29], respectively. This index can be defined as the difference between the two hemispheres in the prefrontal alpha band, that is, the relative engagement of the frontal left hemisphere compared with the right one. Positive AW values correspond to positive motivation (approach behaviors), expressed in terms of the higher activation of the left frontal cortex. In contrast, negative AW values correspond to negative motivation (avoidance behaviors), expressed in terms of the higher activation of the left frontal cortex [29–32].

Numerous researchers have demonstrated the reliability and dependability of the frontal alpha asymmetry as an effective marker in the emotion and neuromarketing research [29–34]. Touchette [29] calculated the frontal alpha asymmetry scores by considering the difference

between the right and left power spectral densities divided by their sum, as obtained using electrodes F4 and F3.

$$AW = \frac{al \, pha(F4) - al \, pha(F3)}{al \, pha(F4) + al \, pha(F3)} \tag{1}$$

#### 2. **Effort Index**

The effort index is defined as the frontal theta activity in the prefrontal cortex. A higher theta power in the frontal region has been linked to higher levels of task difficulty and complexity. This index acts as a sign of cognitive processing that results from mental fatigue [33], and it has been investigated extensively in neuromarketing research [3,24,28,33,35]. This factor demonstrates the importance of positive and negative emotional processing for the creation of the steady memory traces during advertising [30].

#### 3. **Choice Index**

The choice index is defined in terms of the frontal asymmetric gamma and beta oscillations, which are mostly linked to the real decision-making stage. It is also the most related element to willingness-to-pay responses, especially in the gamma band, for evaluating consumer preference and choice. Higher values of gamma and beta bands indicate a stronger activation of the left prefrontal region, and lower values are linked to relatively stronger activation of the right region [32]. Ramsoy et al. calculated the choice index for each band individually (gamma and beta) using electrodes AF3 and AF4 according to Equation (2):

$$\text{Choice index} = \frac{\log(AF3) - \log(AF4)}{\log(AF3) + \log(AF4)} \tag{2}$$

#### 4. **Valence**

Frontal asymmetry has been linked to preferences expressed as valence (i.e., the direction of a customer's emotional state). Left and right frontal activation is related to positive and negative valence, respectively. Numerous studies have supported the hypothesis that the frontal EEG asymmetry is an indicator of valence [34].

#### *3.2. Predictive Features for the Preferences*

This section reports on the studies that focused on the predictive features of neuroscience methods that can aid marketers in forecasting consumer preferences, as described in Table 4. Most of these studies employed distinctions of the standard regression analyses toward their prediction models. We classified these predictive features based on the EEG signal types: (1) rhythms; and (2) transient activities.


**Table 4.**EEG-based neuromarketing studies.

#### 3.2.1. Rhythms as Features

The beta and gamma oscillations from consumers who watched movie trailers were utilized to predict the box office sales and recall [24]. These factors were also used as an indicator of the willingness-to-pay to evaluate consumer preference and choice [32].

The alpha oscillations were used to compute the neural likeness and forecast recall and ticket sales. High-frequency EEG components were connected to both the individual preference (beta wave) and population preference (gamma wave) [24].

In addition, alpha frontal asymmetry was linked to the consumers' unconscious reactions to the product attractiveness [29]. Similarly, Modica et al. [33] linked the higher alpha frequencies to comfort food as well as foreign food products. Moreover, awarded campaigns (i.e., the campaigns that received prizes) in anti-smoking public service announcements were linked to higher alpha frequencies [30].

Lower theta frequencies were associated with the negative results toward choosing products [3]. Moreover, these frequencies have been linked to effective anti-smoking public service announcements [30] and foreign products compared with local products [33].

#### 3.2.2. Transient Activities as Features

In the cognitive processes associated with preferences, several research studies considered the ERP components N400, N200, and P300, each of which can be described as follows.

#### 1. **N400**

Some researchers [10] found that the N400 component can reflect familiarity in forecasting hits in brand extension. A powerful association with well-known brand names was replicated in the case of larger N400 amplitudes, foreseeing greater consumer preference. Another brand extension study [38] reported that the N400 component is associated with the unconscious conceptual categorization of products and brands, albeit not with conscious assessments.

#### 2. **N200**

Some other researchers [36] observed that the N200 amplitude exposed a relationship between the emotional state and brand extension categories. This relationship appeared only with negative emotions and moderate brand extensions. A second study [43] suggested that N200 could indicate the product preferences, as determined by spontaneous procedures, whereas the LPP and PSW could indicate the product preferences, as determined by the conscious cognitive procedures. In a third study, the Cerebro system [27] combined the N200 mean, N200 minima, and ERSP to rank products according to customer preferences. Similarly, in a fourth study [3], the researchers used two methodologies, ERSP and ERP, to forecast the product preferences by examining theta brainwaves, N200, and FNR.

#### 3. **P300**

The consumer preferences for the expanded brand labels were clarified using greater P300 amplitudes [17]. The authors of [40] used P300 as a measure of the consumer preferences for certain product features.

Other researchers [18,39] have considered factors that influence the purchasing decisions. The authors of [18] investigated the roles of mathematical ability, gender, pricing, and discount promotions in the process of consumer purchasing using the active BCI. The authors correlated the 'buy' decisions with ERP components, such as P200 and P300. To understand the product preferences, the authors also evaluated the relative importance (mutual information) of the diverse product (i.e., cracker) characteristics involved in the decision-making process by evaluating the cognitive processing by using the EEG alpha, beta, and theta brainwaves [39]. The researchers used eye tracking for choosing the preferred product. Michael et al. [45] used the same approach

(EEG with eye tracking) to investigate the emotional reactions of tourism preferences by using different stimuli (words, images, and video). The authors observed that the images had higher affective responses than those of words in travel decision-making driven by the unconscious preference.

The authors of [22] built a predictive model for consumer product choice from the EEG data. The researchers studied the roles of gender and age in the process of consumer preferences in terms of liking/disliking by using a passive BCI. Another research study [20] involved the use of an inductive research method to evaluate three successful and three unsuccessful advertisements by using a dense array EEG data. The results suggest that statistically significant ERP differences existed between the successful and unsuccessful advertisements.

#### *3.3. Preference Classification Algorithms*

Although considerable progress has been made in connecting the brain activities to the user choice, indications that neural assessment could genuinely be beneficial for forecasting the success of marketing activities remain limited [24]. The neural assessments can significantly increase the predictive power above and beyond that of the traditional assessments. Because the neural assessments are better predictors than self-reported assessments, the capability of neuroscience methods to forecast the preferences in real-world situations has incredible consequences for marketers. The first study to address this was published in 2007, and it was concluded that the pre-decisional activation in the related brain areas could be used to forecast the consequent choices [46]. Since then, many neuromarketing studies have published similar conclusions.

In recent years, it has become common practice to use multivariate methods, such as pattern classification, to predict choices. For example, a classification approach can be used to predict the out-of-sample choices from "non-choice' neural responses to different products. The resulting models, which were founded on basic neuroscience methods, are more reliable for predicting the new states and settings compared to traditional market methods, such as focus groups and questionnaires. Moreover, these methods are more likely to be scalable, providing marketers with a deeper understanding of consumers and crucial economic outcomes [46].

Preference modeling using data-mining approaches can be classified into three general signal fields: time, frequency, and a combination of time and frequency. Time-based preference modeling exploits the discovery of the ERPs, as discussed in Section 3.2.2. Frequency-based modeling is accomplished by understanding the features gained by performing power spectrum analyses by generating delta, theta, alpha, beta, and gamma frequency bands, as explained in Section 3.2.1. In addition, different frequency-based feature extraction methods can be used; for instance, common spatial patterns (CSPs) and spectral filters were used in the preference classification for music with an SVM, and an accuracy of 74.77% and 68.22%, respectively, was obtained [47]. Fast Fourier transform (FFT) as the feature extraction method was used, and the SVM obtained an accuracy of 82.14% [48]. In another study, the researchers used the FFT with the radial SVMs for the preference classification and obtained an accuracy of 75.44% [49]. The last preference model combines time and frequency by analyzing the power spectrum at the time intervals that cover the entire duration of the post stimuli interval to assess the brain signals. Several traditional data-mining algorithms have been applied to classify preferences, and the utilization of different time–frequency analysis (TF) approaches has been considered [15,23,50] to detect the user preferences for music. The use of KNN led to an accuracy of 86.5% and 83.34% with different TF approaches, namely the Hilbert–Huang Spectrum (HHS) and spectrogram, respectively [15]. In their extended study, Hadjidimitriou and Hadjileontiadis [51], using familiar music data, managed to obtain a considerably higher accuracy of 91.0%. Another work involved the performance of the music preference classification by using the TF approaches, namely, the discrete Fourier transform with a KNN, and an accuracy rate of 97.99% was achieved. The researchers could achieve a similar accuracy result when using the quadratic discriminant analysis (QDA) at 97.39% [52].

Most researchers applied variations of standard regression analysis to their prediction models. However, numerous techniques and methods have been developed to process EEG to determine the preference state of consumers by using classification algorithms. A review of some experimental neuromarketing articles and comparisons of computational approaches is presented in Table 5.


**Table 5.** Computational approaches for assessing the customer preferences.

Some preference studies involved the use of more than two classification algorithms to discover well-matched classifiers for a definite feature set [12]. Chew et al. [54] measured the user preferences for the aesthetics presented as virtual 3D shapes by using EEG. The researchers used the frequency bands as features to classify EEG into two classes—liked and disliked—by using the KNN and SVM and achieved high classification accuracies of 80% and 75%, respectively. However, these results cannot be considered reliable because the authors used an extremely small dataset of five subjects. In their extended study [55,56], the authors increased the number of subjects to 16 but better results were not obtained. Hakim et al. [44] achieved an accuracy of 68.5% by using the SVM to predict the most and least favored products by combining EEG measures with questionnaire measures.

Classifier combinations such as boosting, voting, or stacking can be used to join numerous classifiers, by merging their outputs and/or training them to complement each other and improve their performance [57]. The selection of the classification algorithms in a BCI system is mostly based on both the form of the acquired mental signals and the context in which the application is expected to be used. However, LDA and SVM are the most commonly applied classification algorithms and have been used in more than half of the EEG-based BCI articles.

Another categorization of the classifications was based on the survey research [57], which considered the BCI and machine learning literature from 2007 to 2017. The findings of the recently designed classification algorithms were divided into four main categories: adaptive classifiers, matrix and tensor classifiers, transfer learning, and deep learning. The adaptive classifiers are classifiers whose parameters, such as the feature weights, are gradually re-assessed and revised over time as new EEG data are presented. The matrix and tensor classifiers (multi-way array) avoid the use of the filters and feature selections and map the data directly onto a certain space with appropriate measures. The transfer learning approach aims to improve the performance of a learned classifier trained on a domain based on the information acquired, while learning another domain or task.

In recent times, deep learning has been employed in EEG-based preference recognition. DNNs are gatherings of the artificial neurons organized in layers to estimate the nonlinear resolution border. The most popular type of DNN used for BCIs is the multi-layer perceptron (MLP), which normally consists of only one or two hidden layers. Other DNN types have been explored less frequently, such as the Gaussian classifier neural networks or learning vector quantization neural networks [57]. Furthermore, Teo et al. [55,56] proposed deep learning approaches for preference recognition by using 3D rotating objects. The results prove that the use of the deep network could obtain a higher accuracy compared to that of the other machine learning classifiers, such as the SVM, RF, and KNN algorithms. In their extended research, Teo *et al.* [23] improved the result accuracies by using a deep network plus dropout architecture, with rectified linear units and tanh for activation at 79.76%.

Table 6 presents some neuromarketing studies that used different classification algorithms to obtain the most accurate results in predicting the consumer preferences. The review highlights the need to use more features and hybrid classifiers to improve the accuracy results of the predictions [22,44].

#### *3.4. Preference Detection Using a BCI*

This section explains the design process of the neuromarketing experiment to predict the consumer preferences and choices. First, a BCI device must be placed on the head of a consumer. Next, the consumer is asked to look at the products. During the recording phase, the EEG data are recorded concurrently while the consumer views a product. After viewing each product, the user is asked for his or her preference toward the product in terms of a five- or nine-point scale of subjective rank. When all products are displayed, the subjective ranks must be manually labeled as liked or disliked classes. Next, the EEG signals undergo a signal preprocessing and feature extraction. The classification module is developed based on the ground truth completed by the consumer's selection (subjective ranks).

Figure 3 presents a proposed BCI system for the preference detection composed of three main modules: signal preprocessing, feature extraction and selection, and classification.

**Figure 3.** EEG-based consumer preference prediction system.


**Table 6.**Classification algorithms applied for recognizing the consumer preferences.

#### **4. Proposed System for the EEG-based Preference Detection**

The performance of EEG recognition systems is based on the selection of a feature extraction technique and a classification algorithm. In our study, we investigated the possibility of detecting two preference states, namely pleasant and unpleasant, by using EEG and classification algorithms. To this end, we performed rigorous offline analysis to investigate the computational intelligence for the preference detection and classification. We used deep learning classification from the DEAP dataset to explore how to employ intelligent computational methods in the form of classification algorithms. This could effectively mirror the preference states of the subjects. Furthermore, we compared our classification performance with those of the KNN and RF classifiers. We built our model in the open source programming language Python and used the Scikit-Learn toolbox for machine learning, along with SciPy for EEG filtering and preprocessing, MNE for EEG-specific signal processing, and the Keras library for deep learning.

In this section, we discuss our methodology along with some implementation details of the proposed system for EEG-based preference detection. We begin with describing the benchmark dataset and ground truth of the preference labeling. Next, we explain the feature extraction. Finally, we illustrate the DNN classification model.

#### *4.1. Dataset Description*

DEAP [58] is a benchmark EEG database developed for affective analysis. The DEAP database was built at the Queen Mary University in London, and it has been used in several research studies for preference detection [59,60]. Table 7 summarizes some characteristics of the DEAP dataset.


#### *4.2. Preference Modeling and Ground Truth*

To set the true preferences (ground truth table), we used the DEAP self-assessment reports to identify the preference states using a nine-point Likert scale for valence dimension. In this study, we considered the valence dimension as a preference indicator to align with the target preference state: pleasant and unpleasant. Moreover, we considered EEG trials that had at least two different valence levels—low and high. The valence levels are classified as follows: (1) low valence if the valence rating is between 1 and 5; and (2) high valence if the valence rating is between 6 and 9. The presence of a low or high valence is an indicator of an unpleasant or pleasant preference state, respectively.

#### *4.3. Data Pre-Processing*

We used the preprocessed EEG dataset from the DEAP database, where the sampling rate of the original recorded data of 512 Hz was down-sampled to a sampling rate of 128 Hz, with a bandpass frequency filter that ranged from 4.0 to 45.0 Hz, and the EOG artifacts were eliminated from the signals using a blind source separation method, namely independent component analysis ICA. The data were averaged and segmented to 60-s trials. Then, we applied a channel selection step with a dimensionality reduction technique. The aim of this step was to reduce the number of features and/or channels used by selecting a subset that excludes very high-dimensional and noisy data. Ideally, the features that are meaningful or useful in the classification stage are identified and selected, while others, including outliers and artifacts, are omitted. Moreover, it reduces the computational cost of the subsequent steps. Therefore, we keep only the channels in which we are interested (Fz, AF4, AF3, F4, and F3).

#### *4.4. Feature Extraction*

Feature extraction plays a crucial role in building EEG-based BCI applications. Thus, we extracted the EEG frequency bands by using a power spectral density (PSD) method called the Welch method. Subsequently, we used the resulting frequency bands to calculate the valence as a preference indicator. Figure 4 presents the block diagram of feature extraction.

**Figure 4.** Feature extraction block diagram.

#### 4.4.1. PSD

The PSD is one of the most popular feature extraction methods based on the frequency domain analysis in the neuromarketing research. Research studies [11,37,39] have demonstrated that the PSD obtained from the EEG signals works well for determining consumer preferences. The PSD method converts the data in the time domain to the frequency domain and vice versa. This conversation is based on the FFT, which calculates the discrete Fourier transform and its inverse.

We used the PSD technique in this study to divide each EEG signal into four different frequency bands: theta (4–8 Hz), alpha (8–13 Hz), beta (13–30 Hz), and gamma (30–40 Hz). The Python signal processing toolbox (MNE) was used for PSD calculation, and the average power over the frequency bands was computed to build a feature using the avgpower function in the MNE toolbox.

#### 4.4.2. Valence

The valence was selected as the measure of preference in this study. Strong valence is reflected in the activation of frontal EEG asymmetry [34]. In DEAP dataset [58], there was high correlation between valence and EEG frequency bands, as shown in Figure 5. The increment in valence led to power increment in alpha, which is consistent with the results in a similar study [34]. We did not use liking rating in the DEAP dataset because the data owners [58] reported conflicting findings between the activation in left alpha power and liking.

**Figure 5.** The average correlations for all subjects of the valence ratings with the power of different frequency bands. The highlighted electrodes correlate significantly (*p* < 0.05) with the valence ratings. © [2011] IEEE. Reprinted, with permission, from:*IEEE Trans. Affect. Comput., DEAP: A Database for Emotion Analysis Using Physiological Signals, Mühl, C.; Lee, J.s. [58].*

We applied the different valence equations and investigated the relationship with the DEAP self-assessment valence measurement. For the valence calculation, we used the extracted alpha and beta band powers from the DEAP data and considered only the following electrodes: Fz, AF3, F3, AF4, and F4. Finally, we computed the values of the valence by using four different equation (Equations (3)–(6)), which have been well-explained in a previous paper [34] authored by an author of this paper.

$$\text{Valence} = \frac{\text{beta(AF3, F3)}}{\text{alpha(AF3, F3)}} - \frac{\text{beta(AF4, F4)}}{\text{alpha(AF4, F4)}} \tag{3}$$

$$\text{Valence} = \ln[al\,\text{pha}(\text{Fz}, \text{AF3}, \text{F3})] - \ln[al\,\text{pha}(\text{Fz}, \text{AF4}, \text{F4})] \tag{4}$$

$$\text{Valence} = alpha(F4) - beta(F3) \tag{5}$$

$$\text{Valence} = \frac{alpha(F4)}{beta(F4)} - \frac{alpha(F3)}{\text{beta}(F3)}\tag{6}$$

#### *4.5. DNN Classification*

Deep learning has been proved as an effective tool to help make the EEG signals meaningful because of its ability to learn the feature representations from the raw data. DNNs are models consisting of the combined layers of "neurons" in which each layer applies a linear transformation to the input data. Then, the transformation result of each layer undergoes processing on the basis of a nonlinear cost function. The parameters of such transformations are deduced by minimizing a cost function [61]. The DNN operates in one forward direction, from the input neurons through the hidden ones (if available) to the output neurons in the forward directions. Assuming that the length window of the samples is *<sup>s</sup>*, the input of the DNN for the EEG signals consists of a multidimensional array *<sup>X</sup><sup>i</sup>* ∈ R*e*×*<sup>s</sup>* that contains *s* samples associated with a window for all *e* electrodes. The fully connected layer, which is the most common type of layer used in building a DNN, consists of fully connected neurons. The input of every neuron is the activation of each neuron from the previous layer [61].

Our study aimed to detect two preference states in the EEG data. Therefore, we employed intelligent classification algorithms that could effectively mirror the preferences of the subjects. We proposed a DNN classifier and compared its performance with those of KNN and RF classifiers.

The block diagram of the proposed DNN classifier is shown in Figure 6. First, the extracted features are normalized using minimum–maximum normalization (Equation (7)) and then fed into the DNN classifier.

$$
\propto \text{scaled} = (\text{x} - \text{min}) / (\text{max} - \text{min})\tag{7}
$$

**Figure 6.** Block diagram of the DNN classifier.

In this study, the considered DNN architecture is a fully connected feed-forward neural network with three hidden layers, which contain units involving rectified linear activation functions (ReLu). The output is obtained as a soft-max layer with a binary cross-entropy cost function. The input layer consists of 2367 units, and each hidden layer consists of 75% units from its predecessor (previous) layer. In particular, the first, second, and third hidden layers involve 1800, 1300, and 800 units, respectively. The output layer dimensions pertain to the number of target preferences state (2) units. To train the DNN classifier, we used Adam gradient descent with three objective loss functions: binary cross-entropy, categorical cross-entropy, and hinge cross function. For transfer learning, we considered the reasonable defaults and followed the established best practices: the start learning rate was 0.001. Then, we linearly reduced the rate with each epoch such that the learning rate for the last epoch was 0.0001. We set the dropout for the input and hidden layers as 0.1 and 0.05, respectively. The stopping criterion of the network training was determined according to the model performance on a testing set. If the network started to over-fit, the network training was stopped. This stopping criterion is helpful for reducing the possibility of over-fitting of the validation data. The network was tested on a test set, which contained approximately 20% of the data samples in the dataset.

#### **5. Results and Discussion**

We predicted the preference states (pleasant or unpleasant) using different classification algorithms: DNN, RF, KNN, and SVM. We used different evaluation measurements: accuracy, recall, and precision. The accuracy was calculated as the average of the binary measurements in which the score of every class was weighted by its availability in the real data. Precision is the proportion of pleasant preference predictions that were actually correct. Recall is the proportion of actual pleasant preferences that were successfully predicted. To evaluate the performance of classification algorithms, we used different cross validation methods: holdout (train/test splitting), k-folds cross validation, and leave-one-out cross validation (LOOCV). Table 8 presents the accuracy results of DNN, RF, KNN, and SVM for each cross validation method. In LOOCV, RF reached the best accuracy results at 90% while DNN reached similar results to RF at 93% in the holdout validation method. In k-fold validation method, KNN achieved the best accuracy results at 90% and 91% when k was set to 10 and 20, respectively. Because the best accuracy results were achieved using the holdout validation, this method was chosen as the base validation for comparison and the tuning the DNN hyper parameter (loss function).

The proposed DNN model was compared with three conventional classification algorithms for EEG signals: SVM, RF, and KNN. Table 9 presents the accuracy, recall, and precision results of RF, KNN, and DNN using three different loss functions in the DNN: the categorical cross-entropy function, binary cross-entropy function, and hinge function. The KNN classifier led to a better accuracy of 88% when K was set to 1. Although the RF achieved a high accuracy of 92%, the DNN reached the highest accuracy result of 94% with hinge cross-entropy function compared to the other conventional classification algorithms. To ensure that the DNN does not have over-fitting problem, we presented the

loss per epoch for each cross-entropy function. The average loss per epoch DNN with the categorical, binary, and hinge function reached a value of 0.28, 0.24, and 0.23, respectively, as shown in Figure 7.


**Table 8.** Accuracy results of preference detection with DNN, RF, KNN, and SVM using different cross validation methods.

**Table 9.** Results of preference recognition using holdout validation and different classifiers: DNN, SVM, RF, and KNN.


**Figure 7.** Loss per epoch on the training and validation sets in DNN using different cross-entropy functions: (**a**) categorical cross-entropy (average loss rate = 0.28); (**b**) Binary cross-entropy (average loss rate = 0.24); and (**c**) hinge cross-entropy (average loss rate = 0.23).

#### **6. Conclusions**

This study proposed a DNN model to detect the preferences from the EEG signals by using the pre-processed DEAP dataset. Two types of features were extracted from the EEG: the PSD and valence. This aspect resulted in a group of 2367 unique features illustrating the EEG activity in each trial. We used different evaluations measures (accuracy, recall, and precision) and various validation methods (holdout, LOOCV, and k-fold cross validation) to test classifiers' performance. We built four different classifiers, namely the DNN, RF, SVM, and KNN classifiers, which achieved an accuracy of 94%, 92%, 62%, and 88%, respectively. The results demonstrate that, although the proposed DNN exhibits a higher accuracy, recall, and precision compared with the KNN and SVM, RF reaches similar results to DNN on the same dataset. Future research directions will involve exploring the DNNs in the context of transfer learning for preference detection.

**Author Contributions:** M.A. conceived, designed, and performed the experiment; analyzed and interpreted the data; and drafted the manuscript. A.A.-N. co-supervised the analysis, reviewed the manuscript, and contributed to the discussion. M.Y. supervised this study. All authors have read and approved the submitted version of the manuscript.

**Acknowledgments:** The authors would like to thank the deanship of scientific research for funding and supporting this research through the initiative of DSR Graduate Students Research Support (GSR) at King Saud University.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Applied Sciences* Editorial Office E-mail: applsci@mdpi.com www.mdpi.com/journal/applsci

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com ISBN 978-3-0365-1801-5