*3.2. Pre-Processing*

The EEG signals recorded from the scalp contained noise due to external interference. Before feature extraction, noise was removed from the signals for better classification results. The data recorded using the EEG electrodes provided by Emotiv have a DC offset in their value that should be removed before doing analysis based on fast Fourier transform. The average value of data from each channel was subtracted from the sample values to remove the DC offset. For reducing muscular artifacts, participants were instructed to minimize their head movements during the EEG acquisition. In a closed eye condition, blink artifacts were also found to be minimal. EMOTIV Insight has a frequency response of 1–43 Hz, which makes the signal noise free from AC line interference at 50 Hz.

#### *3.3. Feature Extraction and Selection*

Neural oscillatory features are widely used in literature for EEG-based classification systems. EEG signals are decomposed into different frequency bands. The Welch method was used to extract power spectral densities with a window length of 128 samples with 50 percent overlap. For feature extraction, power spectral densities of different neural oscillations namely, delta (1–3 Hz), theta (4–7 Hz), alpha (8–12 Hz), beta (13–30 Hz), gamma (25–43 Hz), slow (4–13 Hz), and low beta (13–17 Hz) were computed from each channel. Relative gamma waves were computed by taking the ratio of slow and gamma waves. Eight features from each of the five channels adds up to forty features. Moreover, five alpha and beta asymmetries were calculated, giving a total of forty-five neural oscillatory features. The alpha asymmetries were calculated using the following equations,

$$
\alpha\_f = \frac{\alpha\_{AF4} - \alpha\_{AF3}}{\alpha\_{AF3} + \alpha\_{AF4}} \prime \tag{1}
$$

$$\mathfrak{a}\_{\mathfrak{l}} = \frac{\mathfrak{a}\_{T8} - \mathfrak{a}\_{T7}}{\mathfrak{a}\_{T8} + \mathfrak{a}\_{T7}},\tag{2}$$

$$
\mathfrak{a}\_a = \mathfrak{a}\_f + \mathfrak{a}\_{t\prime} \tag{3}
$$

where *<sup>α</sup>f* , *αt* and *αa* represents the frontal alpha, temporal alpha, and alpha asymmetry respectively and *αchannel* represents the alpha power spectral density of the frontal and temporal EEG channels. Similarly, the frontal and temporal beta asymmetries were calculated using,

$$
\beta\_f = \frac{\beta\_{AF4} - \beta\_{AF3}}{\beta\_{AF3} + \beta\_{AF4}} \, ^\prime \tag{4}
$$

$$
\beta\_l = \frac{\beta\_{T8} - \beta\_{T7}}{\beta\_{T8} + \beta\_{T7}},
\tag{5}
$$

where *β f* , and *βt* represents the frontal and temporal beta asymmetries and *βchannel* represents the beta power spectral densities for the frontal and temporal EEG channels. Features were selected using a *t*-test for determining the statistical significance of features in stress and control group. A lower *p*-value returned by the *t*-test shows that the feature was significantly discriminating in stress and control group.

#### *3.4. Subject Labeling*

The proposed method uses two types of labeling for supervised classification. PSS-10 was used for the questionnaire-based labeling method to subjectively evaluate the stress of participants. This questionnaire consists of ten questions. Each question asks the subject about the frequency of stressful events that have occurred during a period covering the last thirty days. The response for each question is on a scale of 0 to 4, where 0 represents that the event never occurred and 4 represents a frequent occurrence. The total PSS-10 score for each participant has a range between 0 and 40. The participants are divided in two groups i.e., the control and stress group, using the PSS score. A threshold was selected for this purpose, which was given by the following equation,

$$T\_p = \mu \pm \frac{\sigma}{2},\tag{6}$$

where *Tp* is threshold of PSS score, *μ* is the mean, and *σ* is standard deviation of the PSS scores.

The psychologist assigned labels for the stress and control groups after an expert evaluation based on the interview and PSS scores. During the interview, the expert investigated the physical, emotional, behavioral, and cognitive symptoms of stress. Physical symptoms included aches or pain, diarrhea or constipation, nausea, dizziness, chest pain, and rapid heart rate. Emotional symptoms of stress included depression, anxiety, moodiness, irritability, overwhelming feelings, and loneliness. Behavioral and cognitive symptoms included memory problems, inability to concentrate, poor judgment, negativity, racing thoughts, and constant worrying. The interviews were conducted by the psychologist who was affiliated with a public sector hospital. The labels (control/subject) were assigned to participants by the expert based on the responses and the PSS score for each participant. The eighteen symptoms evaluated by the expert are presented in Table 1. The assigned labels were used as ground truth for training the system using the corresponding EEG recordings for each subject.


**Table 1.** The symptoms evaluated by expert psychologist during the interview process.

#### *3.5. Stress Classification*

In this study, five different types of classifiers were used for classification, which are described in the following subsections very briefly to make the manuscript self contained.

#### 3.5.1. Support Vector Machine

A support vector machine uses the statistical learning theory based on the principle of structural risk minimization. An SVM selects a hyper-plane, which separates the feature space in to control and stress group according to the labels provided. The SVM is a highly efficient classifier and is used widely for stress classification in EEG based studies [19,25]. The use of SVM reduces the risk of data over-fitting and provides good generalization performance.

#### 3.5.2. The Naive Bayes

Naive Bayes is a probabilistic classifier based on Bayes theorem. It uses the maximum posterior hypothesis of statistics and works well for high dimensional input data. It is a nonlinear classifier and gives good results in real world problems. In addition, the Naive Bayes classifier requires a small amount of training data to approximate the statistical parameters [36].

#### 3.5.3. K-Nearest Neighbors

KNN is an instance-based learning classifier, where training instances are stored in their original form. A distance function is used to determine the member of the training set, which is nearest to a test example and used to predict the class. The distance function is easily determined if the attributes are numeric. Most instance-based classifiers use Euclidean distance for distance calculation. The distance between an instance with attribute values *a*1, *a*2, ..., *an* (where n is the number of attributes) and *b*1, *b*2, ..., *bn* is defined as, 

$$D\_{\mathcal{S}} = \sqrt{(a\_k - b\_k)^2}.\tag{7}$$

#### 3.5.4. Logistic Regression

The logistic regression algorithm guards against over-fitting by penalizing large coefficients. The output is set to one for training instances belonging to the class and zero otherwise. Logistic

regression builds a linear model based on a transformed target variable, where a transformation function converts a nonlinear function to a linear function.

#### 3.5.5. Multi-Layer Perceptron

In a multi-layer perceptron structure, transfer functions are used for mapping inputs to the output. These functions include sigmoid function, rectified linear unit, and hyperbolic tangent. The classifier uses back-propagation to classify instances. Multi-layer perceptrons are trained by minimizing the squared error of the network output, essentially treating it as an estimate of the class probability, which is given by the following equation,

$$E = \frac{1}{2}((y - f(\mathbf{x})^2)),\tag{8}$$

where *f*(*x*) is the network prediction obtained from the output unit and *y* is the instance class label.

#### **4. Results and Discussion**
