1. Introduction
Recent technological developments have changed the roles of humans in safety-critical and complex areas, such as autonomous driving vehicles, aviation, healthcare systems, industries, etc., from manual to autonomous control systems [
1]. However, due to the involvement of humans in several tasks, at the same time, the growing sophistication of these processes makes human intervention and control difficult. Therefore, there is an urgent need for the development of more accurate and automated systems. An analysis of an individuals’ cognitive, emotional, and psychological states can provide a solution using brain–computer interface (BCI) technologies [
2]. Such information measures the mental states of the users to make these environments safer for human–machine interfaces. The brain’s physiological activities have been studied by electroencephalograms (EEGs) [
3], functional magnetic resonance imaging (fMRI) [
4], functional near-infrared spectroscopy (fNIRS) [
5], magnetoencephalograms (MEGs) [
6], and other forms of biosignals, such as electrooculograms (EOGs) [
7], electrocardiograms (ECGs) [
7,
8], and galvanic skin responses (GSRs) [
9], to detect various conditions [
10,
11]. Taking the day-to-day perspective of a mental activity measurement, issues related to size, weight, expense, power consumption, and radioactivity restrict the usage of MEGs and fMRI [
12]. EOG, ECG, and GSR signals provide some degree of correlation with mental states (mental fatigue, drowsiness, and stress) [
10]. However, such techniques have demonstrated success only in combination with neuro-imaging methods linked to the central nervous system [
10]. As a result, fNIRS and EEG signals proved the most appropriate choices for BCI systems [
10]. EEG signals are favored over fNIRS signals, as they offer higher sensitivity to variations in brain activities and higher temporal resolution [
10]. Moreover, researchers have widely used EEG signals to study emotions, cognitive load, fear of missing out, drowsiness, and schizophrenia, due to their low-cost, portable, and non-invasive properties [
13,
14,
15,
16,
17,
18].
Recently, many studies have been presented for detecting mental states using EEG signals. The mental states of “workload”, “fatigue”, and “situational awareness” have been studied by examining the correlation between mental workload and EEG signals in different conditions, such as in airplane pilots and car drivers [
11]. Myrden et al. presented an EEG-BCI model to predict the mental states of frustration, fatigue, and attention. Different features extracted using the fast Fourier transform (FFT) have been classified with linear discriminant analysis (LDA), support vector machines (SVMs), and naive Bayes classifiers [
19]. Li et al. recognized reading silently, a comprehension task, a mental arithmetic task, and a question-answering task based on the self-assessment Manikin (SAM) model [
20]. Nuamah et al. classified five tasks (baseline, visual counting, geometric figure rotation, letter composition, and multiplication) using the short-time Fourier transform (STFT) to extract different features, which were classified using an SVM classifier [
21]. Liu et al. presented a frequency domain analysis of features using the FFT in combination with SVM to detect attentive and inattentive mental states of students [
22]. Ket et al. classified attention, no attention, and rest states using sample entropy and linear features with an SVM classifier [
23].
Wang et al. used the focus of attention ability during mathematical problem solving and lane-keeping driving tasks. The central, parietal, frontal, occipital, right-motor, and left-motor power spectra computed using filtering and independent component analysis (ICA) were classified with an SVM classifier [
24]. Djamal et al. evaluated features from raw EEG signals and wavelet decomposition to recognize attention and inattention activities [
25]. Arico et al. used stepwise linear discriminant analysis and the statistical test of analysis of variance (ANOVA) to detect easy, medium, and hard mental assessments [
12]. Hamadicharef et al. developed an attention and non-attention classification state model using a combination of filter banks, common spatial patterns, and a Fisher linear discriminant classifier [
26]. Mardi et al. used Log energy, Higuchi, and Petrosian’s fractal dimension to extract chaotic features for detecting alertness and drowsiness states [
27]. Richer et al. evaluated the band power of frequency bands. They computed histograms of naive and entropy-based scores using the P2 algorithm and classified them with binary classifiers [
28]. Aci et al. used STFT-based features to detect focused (F), unfocused (UF), and drowsiness (D) mental states [
29]. Zhang et al. used six convolutional networks and one output layered deep neural network to predict F, UF, and D states [
30]. Islam et al. explored multivariate empirical mode decomposition (MEMD) and the discrete wavelet transform (DWT) to detect working and relaxed states. The nonlinear features extracted from intrinsic mode functions and subbands (SBs) have been classified with an ensemble classifier [
31]. Tiwari et al. used rhythm level analysis using filtering and the FFT. The SVM, k-nearest neighbor (KNN), and random forest classifiers have been used to detect high- and low-level attention [
32]. Samima and Sarma used an analysis of rhythms using filtering and artificial neural network (ANN) classifiers for mental workload level assessments [
33]. Mohdiwale et al. used a DWT-based rhythm analysis using teaching–learning-based optimization for detecting cognitive work assessments [
34]. Easttom and Alsmadi presented a comparative analysis of EMD and variational mode decomposition to extract nonlinear entropy and Higuchi features for mental state detection [
35]. Khare et al. used wavelet-based analysis using only the rational dilation wavelet transform (RDWT) to extract five statistical and nonlinear features and classified them using an ensemble classifier to detect various mental states [
36]. Kumar et al. used analysis of EEG rhythms using the discrete Fourier transform and power spectral density (PSD) to detect mental states using the KNN classifier [
37]. Rastogi and Bhateja explored artifacts of or noise elimination in mental state EEG signals using a stationary wavelet transform (SWT)-enhanced fixed-point fast ICA technique [
38].
The methods in the literature used traditional feature extraction from raw EEG signals, statistical analysis, filtering techniques, frequency-based transforms such as the FFT or STFT, rhythm-based analysis, and wavelet-based decomposition. However, direct feature extraction exhibits a decreased performance [
15], frequency-based transforms result in a time–frequency trade-off [
15], filtering and rhythmic analyses require choosing filter coefficients [
15], and wavelet-based methods require the selection of a mother wavelet [
15]. The experimental and empirical selection of parameters can cause information loss and performance degradation due to misclassification [
15]. Thus, to overcome these shortcomings, we propose an ensemble-based analysis using advanced decomposition techniques, including the tunable Q wavelet transform (TQWT), the multilevel DWT (MDWT), and the flexible analytic wavelet transform (FAWT). Individual and feature fusion for the automated detection of three mental states (F, UF, and D) is accomplished with an optimizable ensemble technique. The major contributions of the proposed work are listed below:
Analysis of ensemble decomposition techniques using multi-wavelet decomposition.
Statistical analysis to reduce the feature dimensions of multi-wavelet feature analysis for mental state detection.
Analysis of feature fusion to detect the best combination of features.
Exploring an optimized ensemble classifier to determine the optimum hyper-parameter selection.
The remainder of paper is organized as follows:
Section 2 explains the methodology. The results are presented in
Section 3. The discussion and conclusions are presented in
Section 4 and
Section 5.
3. Results
We aimed at classifying mental states using ensemble decomposition and classification algorithms. At first, stratification of the EEG signals was performed to obtain 3840 non-overlapping samples for each class. The stratified signals were decomposed into SBs using three wavelet-based decomposition techniques (MDWT, TQWT, and FAWT). We used four-level decomposition using Daubechies wavelet (db2), yielding five SBs corresponding to five EEG rhythms. The tuning parameters of the TQWT were chosen as
, and
. For the FAWT, the tuning parameters were selected as
, and
, respectively. We extracted 27 features from the SBs of the MDWT, FAWT, and TQWT with an empirical setting of the tuning parameters. The current analysis includes a feature matrix of all the channels with 27 features. Therefore, a total of 378 features with a total of 2040 segments were introduced into the ensemble classification techniques. The model uses three validation strategies, i.e., HOCV, FFCV, and TFCV. It is noteworthy to mention that we have maintained the same experimental setup.
Table 1 shows the accuracy obtained for each SB using MDWT features. The accuracy of two-class and multiclass classification is highest for SB-1. The model yielded the highest accuracies of 95.07%, 94.93%, and 94.36% for D vs. F using HOCV, FFCV, and TFCV, respectively. For UF vs. F, the highest accuracies were 91.18%, 89.34%, and 88.60%, while for D vs. UF, the accuracies were 88.84%, 89.78%, and 88.53% using the optimizable ensemble classifier with HOCV, FFCV, and TFCV techniques. Similarly, three-class classification yielded the highest accuracies of 87.45%, 87.45%, and 86.27% using HOCV, FFCV, and TFCV.
The accuracy obtained for the TQWT features using an optimized ensemble classifier is shown in
Table 2. The accuracy of SB-1 was higher then other SBs. For D vs. F, the optimizable model obtained the highest accuracies of 95.22%, 96.10%, and 94.85% with HOCV, FFCV, and TFCV. HOCV, FFCV, and TFCV for UF vs. F classification yield the highest accuracies of 93.01%, 91.32%, and 90.74%. For D vs. UF, the optimizable ensemble classifier yielded accuracies of 90.74%, 90.74%, and 90.22% with HOCV, FFCV, and TFCV. Similarly, HOCV, FFCV, and TFCV techniques yielded accuracies of 85.78%, 89.82%, and 89.02% for three-class classification.
Table 3 shows the accuracy obtained in each SB using FAWT-based features and the optimizable ensemble classifier. The analysis reveals that the last SB yielded the highest accuracy for different classification scenarios.
Table 3 shows that the ensemble-based classifier yielded the highest accuracies of 97.79%, 96.91%, and 96.84% for D vs. F classification using HOCV, FFCV, and TFCV techniques. The model provided the highest accuracies of 93.75%, 92.28%, and 91.01% for UF vs. F, D vs. UF, and D vs. F vs. UF using the HOCV technique. The highest accuracies of 93.09%, 91.10%, and 90.90% for UF vs. F, D vs. UF, and D vs. F vs. UF were obtained with FFCV. The accuracies obtained with TFCV for UF vs. F, D vs. UF, and D vs. F vs. UF were 92.94%, 90.96%, and 90.10%, respectively.
Thus, it is clear from
Table 1,
Table 2 and
Table 3 that the accuracy of our developed model is almost stable for three validation techniques in various SBs for different classification scenarios. SB-1 generated the highest accuracy for MDWT and TQWT feature classification. The accuracy yielded by FAWT-based features was highest in SB-7. Analysis also reveals that FAWT-based features provide discernable characteristics, and due to this it obtained the highest accuracy over TQWT- and MDWT-based features. Further, our developed model is consistent for different classification scenarios (binary and multiclass analysis) with three validation techniques. The features provided by drowsy and focused classes are highly discernable; therefore, they yielded the highest classification rate over other scenarios. On the other hand, the features of focused and unfocused classes significantly overlap, resulting in a decreased model performance. An exemplary training curve obtained for the optimized ensemble classifier is shown in
Figure 4.
As stated earlier, our training and testing feature set comprised all features from all channels. Analysis of the model with all features may increase the time without improving the classification performance [
54]. Therefore, we used feature ranking analysis to test our model performance with optimal features using the minimum redundancy feature selection technique.
Figure 5 shows the feature rank obtained for FAWT, TQWT, and MDWT-based features. As seen from
Figure 5, out of twenty-seven features, only a few features are statistically significant for classification. The feature importance values for FAWT, TQWT, and MDWT decrease significantly or remains the same after six features. This reveals that a similar performance can be obtained using less features with higher feature ranks.
To obtain an insight into our developed model, we explored a fusion of the most important features of the three decomposition techniques. During fusion, we concatenated the features from all channels according to their ranks. As evident from
Table 1,
Table 2 and
Table 3, SB-1 for TQWT and MDWT and SB-7 for FAWT features yielded the highest accuracy. Therefore, we have fused the features from these SBs.
Table 4 represents the accuracy obtained by feature fusion of decomposition techniques with different feature combinations. As seen from
Table 4, the accuracy yielded by the ensemble model increases with an increase in the feature count. The model provides the highest performance with four features. After that, the accuracy of the model decreases slightly or remains constant. Furthermore, our model exhibits that feature fusion helps to improve system performance. The fusion of three decomposition techniques yielded the highest accuracy, followed by the features based on a fusion of TQWT and FAWT decomposition. The combination of TQWT and MDWT feature fusion resulted in the lowest performance. Further, to obtain the highest score, we evaluated the highest performance measures of TFCV using iterative majority voting (IMV). For IMV, we conducted multiple rounds of TFCV, and selected the one with the best overall and fold-wise accuracy. The model exhibited the highest accuracy of 97.8%, obtained twice during fold-wise analysis.
Further, we tested the model performance using four performance metrics, as shown in
Table 5. The performance measures show that the drowsy class generated the most discriminant features with the highest recall, SPE, PPV, and F1 score. The focused class is the second best, while the worst performance is exhibited by the unfocused class. The analysis shows that feature fusion of the drowsy class yields the highest recall, SPE, PPV, and F1 score of 93.13%, 95.91%, 91.76%, and 92.44%. The recall, SPE, PPV, and F1 score yielded for the drowsy class was 97.12%, 99.63%, 99.26%, and 98.18% using the IMV technique.
To obtain more insight into the proposed system, the receiver operating characteristics (ROC) and area under the curve (AUC) were evaluated, as shown in
Figure 6. The ROC and AUC of D vs. F and UF, F vs. D and UF, and UF vs. D and F states for fused features are shown in
Figure 6a–c. It is evident that the AUC for drowsy is 94%, while for focused and unfocused states it is 95% and 92%, with an average accuracy of 93.67%, respectively.
4. Discussion
We have tested the efficacy of our proposed model by comparing it with existing state-of-the-art techniques. Borghini et al. [
11] computed the power of alpha, theta, and delta frequency bands. An analysis of these frequency bands was performed, and they reported an accuracy of around 90%. Myrden et al. [
19] used the FFT to evaluate frequency domain features and classified them with SVM, LDA, and naive Bayes classifiers. Their model yielded the highest accuracies of 71.6%, 74.8%, and 84.8% for frustration, fatigue, and attention levels using the LDA classifier. In another method by Liu et al. [
22], an FFT- and SVM-based model yielded an accuracy of 76.82%. Li et al. [
20] used an SAM model and obtained and average accuracy rate of 57.03% with the KNN classifier. Nuamah et al. [
21] presented a combination of STFT and SVM for feature extraction and classification. Their method obtained an accuracy of 93.33% using the radial basis function kernel. Ket et al. [
23] automatically identified three tasks, namely attention, no attention, and rest, using two experiments (ball playing or walking cartoon). The sample entropy and linear features classified using SVM and their method yielded an accuracy of 76.19% and 85.24% using two experiments with sample entropy features. Wang et al. [
24] fed features extracted by filtering and ICA into an SVM classifier and achieved 86.2% and 84.6% accuracies in the classification of driving tasks and math-related activities. Djamal et al. [
25] computed non-wavelet- and wavelet-based features and classified them with an SVM classifier, and their method provided accuracies in the range of 44–58% and 69–83%, respectively. Hamadicharef et al. [
26] developed a filters, common spatial patterns, and Fisher linear discriminant-based attention and non-attention classification model with an accuracy of 89.4%. Chaotic features based on log energy, Higuchi, and Petrosian’s fractal dimension artificial neural network classifiers claim an accuracy of 83.3%. Richer et al. [
28] used the power of frequency bands, naive, entropy scores, and a binary classification model to obtain a sensitivity of 82% and 80.4% and a specificity of 82.8% and 80.8% for the focus and relax scores, respectively. The methods discussed above have been tested on different datasets for mental state classification. The proposed method was compared with the work of Aci et al. [
29] and Zhang et al. [
30] on the same dataset, as shown in
Table 6. Aci et al. used STFT-based feature extraction to compute different feature sets. ANFIS, SVM, and KNN classifiers were employed to classify 154 features with accuracies of 81.55%, 77.76%, and 91.72%, respectively. A method by Zhang et al. [
30] used a deep-learning-based convolutional neural network (CNN) and provided an accuracy of 96.4%. Kumar et al. explored the analysis of PSD using FFT-based feature extraction and a KNN classifier. The channel-wise and grouped channel analysis yielded accuracies of 80% and 97.5% [
37]. Khare et al. used RDWT wavelet analysis with statistical feature extraction. The classification of features resulted in an accuracy of 91.77% using the bagged tree classifier [
36]. Rastogi and Bhateja et al. [
38] performed elimination of artifacts and noise using SWT and ICA. However, their group did not report the classification accuracy. In our method, we have used ensemble-based decomposition and extraction of nonlinear features. The individual analysis of MDWT, TQWT, and FAWT features yielded accuracies of 88.27%, 89.02%, and 90.1%. Fused feature analysis yielded accuracies of 90.98%, 88.62%, and 89.61% for TQWT/FAWT, TQWT/MDWT, and MDWT/FAWT feature fusion using the TFCV technique. A combined fused feature analysis using TFCV and IMV resulted in accuracies of 92.45% and 97.8%. The analysis shows that our developed model has surpassed the performance of existing state-of-the-art techniques, showing the efficacy of our developed model.