*4.1. Dataset*

A total of 33 participants related to the education field volunteered for this study. The participants reported no history of brain injury and they were not using any medications that could have affected their brain activity at the time of experiment. Among these 33 healthy participants, 20 were male and 13 were females (60.6% male and 39.4% female). The participant's ages ranged from 18 to 40 years (*μ* = 23.85, *SD* = 5.48). In line with the Helsinki Declaration [37] and the departmental ethics guidelines, all participants of the study were briefed about the research goals. In addition, a signed informed consent was obtained from each participant. This study was approved by the Directorate of Advanced Studies and Research at the University of Engineering and Technology, Taxila.

#### *4.2. Performance Parameters*

The parameters used in this study include average accuracy rate, Kappa statistic, F-measure, mean absolute error (MAE), and root mean absolute error (RMAE). Accuracy is the ratio of truly classified instances over total number of instances in the recorded data. F-measure is calculated by considering the precision and recall values. The Kappa statistic values ranges between 0 and 1, where 0 represents chance level classification and 1 means perfect classification. A value less than zero shows that the classification is worse than chance level. For stress classification, the generalization performance of the proposed system was tested using cross validation to avoid over- and under-fitting as well as to make sure the the proposed system adopts well to unseen data. A 10-fold cross validation technique was used in this study, where the training data was randomly divided into ten equal parts (nine parts for train and one part for test) and the process was repeated 10 times. During the process, every instance was used for testing at a time and the remaining instances were used for training of the classifier.

#### *4.3. Stress and Control Group*

The scores acquired from participants using the PSS questionnaire are shown in Figure 4. The green and red bars represent the PSS scores of participants belonging to the control and stress groups respectively. The yellow bars indicate the PSS scores of participants not considered in either the stress or the control group. Overall, for the PSS scores we have (*μ*, *σ*)=(20.4 ± 6.14). A participant with a PSS score below 17.33 was considered to be in control group, whereas a participant with a PSS score higher than 23.47 was categorized in the stress group. These values were calculated using the threshold criteria defined in Equation (6). Hence 12 participants were put into the stress group (red bars) and 9 into the control group (green bars).

In expert (hybrid) evaluation, the psychology expert considered both PSS scores and the symptoms obtained from the interview method. The expert interviewed each participant for an average duration of 25 min. Out of the 33 participants, 10 were assigned to the stress group and 10 were assigned to the control group. The details about each participant regarding gender, age, PSS score, the label assigned by using PSS score, and the label assigned by expert is given in Table 2. There were fifteen differences in the assigned labels between those assigned using PSS scores and the expert (hybrid) evaluation. The experimental results show that expert (hybrid) labeling helps in improving the classification of long-term stress. It is important to note here that in a majority of the cases regarding label mismatch (13 out of 15), the PSS score ranges between 17 and 25, which covers the neutral range. Since we hypothesize that the expert (hybrid) labeling is better suited for the classification task, we have used these labels as ground truth.

**Figure 4.** A graphical representation of Perceived Stress Scale (PSS) scores for participants showing labels assigned using the PSS based labeling method (green: control group, red: stress group, yellow: neutral).


**Table 2.** Gender, age, PSS score, and labels for the participants according to PSS and expert-based (hybrid) labeling (A-control group, B-stress group, X-neutral).

#### *4.4. Feature Selection Using t-Test*

We used a two-sided Student's *t*-test with a significance level of 0.05 and results using the *p*-values are shown in Table 3 for different EEG oscillations. For the *t*-test, the degree of freedom was 9 and the null hypothesis was tested for various features for stress and control groups. It is evident that at a confidence level of 0.05, none of the extracted feature were found statistically significant in the stress and control condition when PSS-based labeling was used for the reference standard. It is also revealed that beta and gamma waves from *AF*3 are statistically significant features in the stress and control group, when labels assigned by expert evaluation were used as a reference standard. Five additional features, namely frontal (*<sup>α</sup>f*) and temporal (*<sup>α</sup>t*) alpha asymmetries, frontal (*β f*) and temporal (*βt*) beta asymmetries, and alpha asymmetry (*<sup>α</sup>a*) were also used (see Equations (1)–(5)). Results of the *t*-test applied over these features in stress and control groups are presented in Table 4. It can be seen that alpha asymmetry is statistically different between the stress group and the control group using expert-based labeling. Three significant features, namely beta (AF3), gamma (AF3), and alpha asymmetry were selected for long-term stress classification based on the results of *t*-test. A *p*-value of 0.04 and 0.03 for beta and gamma oscillations indicated their statistical significance. The *p*-value of alpha asymmetry from frontal and temporal channels was 0.0005, indicating the statistical significance of alpha asymmetry from both temporal and frontal regions.

The box plots are presented in Figure 5, where the first row represents features acquired through PSS labeling including alpha asymmetry, beta, and gamma respectively. The second row shows the same features acquired through expert evaluation. The + indicates an outlier, and the red line within the box represents the median value. A comparatively short box plot suggests that the features are in agreemen<sup>t</sup> with each other. A taller box plot suggests features show different distribution within themselves. From box plots (Figure 5b,c,e,f) it is observed that there is not much difference in the beta and relative gamma features to differentiate stress and control groups for both expert- and PSS-based labels. However, Figure 5a,d are candidates for good features as they appear to differentiate the stress and control group. In Figure 5a, the alpha asymmetry for the stressed group does not have a long lower whisker, which shows alpha asymmetry is not varied along the negative quartile, while in Figure 5d, the stressed group has varied alpha asymmetry as shown by the lower and upper whiskers. Also, in Figure 5d the median is comparatively at the center of the distribution. This suggests that alpha asymmetry is a good candidate to be used in the stress classification task.


**Table 3.** Results for the *t*-test on various neural oscillations including PSS and expert-based labeling methods.

**Table 4.** Asymmetries in PSS and expert evaluation.


**Figure 5.** Box plots of features. (**a**) Alpha asymmetry; (**b**) beta; (**c**) gamma; (**d**) alpha asymmetry (EE); (**e**) beta (EE); (**f**) gamma (EE); EE represents the labeling method of expert evaluation.
