Novel Machine Learning-Based Brain Attention Detection Systems

Wang, Junbo; Kim, Song-Kyoo

doi:10.3390/info16010025

Open AccessArticle

Novel Machine Learning-Based Brain Attention Detection Systems

by

Junbo Wang

and

Song-Kyoo Kim

^*

Faculty of Applied Sciences, Macao Polytechnic University, R. de Luis Gonzaga Gomes, Macao, China

^*

Author to whom correspondence should be addressed.

Information 2025, 16(1), 25; https://doi.org/10.3390/info16010025

Submission received: 28 November 2024 / Revised: 24 December 2024 / Accepted: 3 January 2025 / Published: 5 January 2025

(This article belongs to the Special Issue Real-World Applications of Machine Learning Techniques)

Download

Browse Figures

Versions Notes

Abstract

:

Electroencephalography (EEG) can reflect changes in brain activity under different states. The electrical signals of the brain are observed to exhibit varying amplitudes and frequencies. These variations are closely linked to different states of consciousness, influencing the internal and external behaviors, emotions, and learning performance of humans. The assessment of personal level of attention, which refers to the ability to consciously focus on something, can also be facilitated by these signals. Research on brain attention aids in the understanding of the mechanisms underlying human cognition and behavior. Based on the characteristics of EEG signals, this research identifies the most effective method for detecting brain attention by adapting various preprocessing and machine learning techniques. The results of our analysis on a publicly available dataset indicate that KNN with the feature importance feature extraction method performed the best, achieving 99.56% accuracy, 99.67% recall, and 99.44% precision with a rapid training time.

Keywords:

brain attention; electroencephalography (EEG); biomedical signal processing; machine learning; emotion detection

1. Introduction

Electroencephalography (EEG) is a technique for capturing an electrogram that reflects the spontaneous electrical activity of the brain. The biosignals obtained through EEG are indicative of the postsynaptic potentials associated with pyramidal neurons located in the neocortex and allocortex [1]. EEG is capable of reflecting changes in brain activity. EEG waves from a frequency perspective are mainly classified into five types: alpha, beta, theta, delta, and gamma waves. The amplitudes of the brain’s electrical signals differ under various states. These combinations of consciousness form human internal and external behaviors, emotions, and learning performance. Furthermore, they can be employed to measure personal level of attention, indicating an individual’s ability to consciously concentrate on something. EEG is capable of identifying abnormal electrical discharges, including sharp waves, spikes, and spike-and-wave complexes, which are typical indicators of epilepsy. This ability makes EEG a valuable tool in medical diagnostics. It can monitor the onset and spatio-temporal progression (both location and timing) of seizures, as well as detect status epilepticus. Furthermore, EEG is employed in diagnosing various conditions, such as sleep disorders, assessing the depth of anesthesia, evaluating coma, identifying encephalopathies, monitoring cerebral hypoxia after cardiac arrest, and confirming brain death. Historically, it has served as a primary diagnostic method for tumors, strokes, and other focal brain disorders [2]. The application of EEG has decreased with the emergence of high-resolution anatomical imaging methods like magnetic resonance imaging (MRI) and computed tomography (CT). Nonetheless, despite its limited spatial resolution, EEG still plays a vital role in research and diagnosis. As one of the few portable techniques available, it provides temporal resolution in the millisecond range, a feature that CT, PET, and MRI do not possess [3]. In addition, research on brain attention can identify strategies to improve the efficiency and quality of human cognition and behavior. For example, training and regulating brain attention can enhance performance in learning, work, and everyday life. This encourages the progress of fields such as neuroscience, cognitive psychology, and education [4]. Currently, various machine learning (ML) technologies are rapidly evolving [5]. ML offers numerous advantages, including robust data processing, automated feature extraction, strong predictive abilities, and high interpretability. These benefits make ML technology valuable for supporting and aiding research and applications focused on brain attention.

Recently, the field of EEG-based emotion and brain attention detection has been extensively studied. Numerous studies have proposed various machine learning (ML) techniques to develop brain and emotion attention detection systems [6]. Various ML models, including support vector machines (SVMs), convolutional neural network long short-term memory (CNN-LSTM), and deep convolutional neural networks (DCNs), have been employed to create advanced EEG-based emotion and/or attention detection systems [6,7,8]. Several studies have suggested a learning model for analyzing and processing EEG signals from astronauts through the adaptation of an ML algorithm [9]. Research on brain attention can enhance our understanding of the mechanisms underlying human cognition and behavior. Researching brain attention allows us to identify problems such as cognitive fatigue, attention deficits, and multitasking, thereby enhancing our understanding of these issues and providing a basis for their prevention and treatment [7,8]. This study uses publicly accessible EEG datasets [8]. The data are preprocessed according to the characteristics of EEG signals with techniques including Butterworth bandpass filtering, baseline drift removal, short-time Fourier transform (STFT), and inverse STFT. Subsequently, we use various ML methods to train models and evaluate the models obtained from multiple training iterations to determine the most effective approach for detecting brain attention. This approach can then be applied to different scenarios.

This article is structured into four additional sections. In Section 2, we delve into the preliminaries, laying out the theoretical foundations of our study. This includes discussions on short-time Fourier transform (STFT), candidate machine learning algorithms, and contemporary feature reduction techniques. Additionally, a concise overview of the performance metrics used in our system is provided. Section 3 proposes an optimal design for machine learning-based brain attention systems, based on our experimental findings, and presents enhanced performance results. Section 4 provides the design of the optimized EEG brain attention detection system design based on the experiments. Finally, Section 5 summarizes the key contributions of this paper comprehensively.

2. Preliminaries of Enhanced EEG-Based Attention Detection

The preprocessing steps taken prior to the primary processing flow, which include adjustments for baseline drift and the elimination of noise frequencies, are in accordance with the methods detailed in past research. This alignment with previous studies ensures that the preprocessing stage effectively prepares the data for subsequent analysis and processing [7,8]. This section also presents foundational knowledge on various ML models as well as the candidate feature selection methods. These feature selection methods are targeted to identify the most relevant features to design an enhanced EEG-based brain attention detection system, ultimately enhancing its accuracy and efficiency.

2.1. Preprocessing for EEG Signal Improvement

Prior to feeding data into a machine learning (ML) model, preprocessing is essential to improve classification accuracy. EEG signals are acquired using sensors positioned on the body surface, allowing for the detection of seizure onset and spatio-temporal evolution, as well as the identification of status epilepticus. Additionally, EEG can identify abnormal electrical discharges, such as sharp waves, spikes, and spike-and-wave complexes. The electrodes are positioned according to the 10–20 system [10], which provides a standardized method for electrode placement based on specific anatomical landmarks. Our EEG dataset has a total of 14 channels of data, but only 7 channels (F3, F4, Fz, C3, C4, Cz, and Pz) are used for EEG data recording.

A Butterworth bandpass filter was implemented to ensure that the signal remained within the frequency range of 0.2 Hz to 43 Hz, and the power line interference (PLI) noise is removed (see (a) in Figure 1). Unlike other types of filters, the Butterworth filter exhibits a smooth transition from the passband to the stopband. The British engineer and physicist Stephen Butterworth first described this filter in 1930 in his paper [11]. In addition, short-time Fourier transform (STFT) is effective in converting the signal to the frequency domain. STFT is a Fourier-based transform used to identify the sinusoidal frequency and phase content of local segments of a signal as it varies over time [12].

Practically, the STFT computation procedure entails splitting a longer time signal into shorter, equal-length segments and then performing Fourier transform on each segment separately. This unveils the Fourier spectrum for each segment, providing a time–frequency representation of the signal (see (b) in Figure 1). It is easier for us to further process the signals in the frequency domain. For each channel, we retained the most important frequencies from 0 to 18 Hz, then binned the data at 0.5 Hz intervals (resulting in a total of 36 bins), and applied a 15-s sliding window. The purpose of doing this was to compress the data and make the feature spaces between different states more distinct, thereby enhancing the model performance. The data of a single bin after the above preprocessing steps are shown in Figure 2.

2.2. Various Machine Learning Algorithms

A range of machine learning algorithms have been evaluated to identify the optimal model for EEG-based attention detection. For our analysis, we selected ensemble-based learning models and traditional machine learning models [13,14,15]. Six ML algorithms were applied to the same datasets. This research considers these ML models with balanced datasets:

Support vector machine (SVM) [15] is utilized for classification and regression tasks. However, it is not particularly effective for datasets that have imbalanced class distributions, noise, and overlapping class samples.
ResNet (residual neural network) [16] allows weight layers to learn residual functions relative to the inputs of previous layers, rather than learning functions independently. Extensive empirical evidence indicates that these residual networks are more straightforward to optimize and can achieve significantly higher accuracy with increased depth. Consequently, this model simplifies the training of networks that exceed the depth of those previously utilized.
GoogLeNet is a form of convolutional neural network that is based on the Inception model. It features Inception modules, which allow the network to select from different sizes of convolutional filters in each block. The Inception network arranges these modules in layers, sometimes using max-pooling layers with a stride of 2 to reduce the grid resolution by half [17].
Decision Tree (DT) is a supervised learning method used in many fields. Models where the target variable can take on discrete values are referred to as classification trees; in these structures, leaves represent the class labels, while branches symbolize the conjunctions of features that lead to those labels [18].
Random Forest (RF) develops a more powerful classifier than a single decision tree by creating a collection of decision trees. It is an ensemble learning approach made up of several decision trees generated through random feature selection and bootstrapping techniques [19]. This combined model adjusts sample weights according to the last classifier’s performance and intensifies the training of misclassified samples in the next iteration.
K-Nearest Neighbor (KNN) [14] is a model that constructs the classifier function by voting among its local neighboring data points [20,21]. It finds the k closest data points to a given sample in the feature space and assigns the majority class among these neighbors to the sample.

For neural networks such as ResNet-18 and GoogLeNet, we utilized the PyTorch framework to construct and train the networks, and we also employed pre-trained models provided by the PyTorch official website. To accommodate our binary classification task, we adjusted the output size of the last fully connected layer in both models to 2. The remaining hyperparameters are listed in Table 1.

For the other models, the hyperparameters are listed in Table 2. To ensure the reproducibility of the training, the random state parameter was set to 42. The GPU device used for training was NVIDIA RTX 3060, which has 3840 CUDA cores and 6GB (GDDR6) memory. The CPU used was Intel i7-11800H, and the device had 32GB RAM.

The results of reproducing the above models with the EEG dataset are presented in Section 3.1. It is important to note that the performance outcomes of these ML models are not identical to those from the original research.

2.3. Various Feature Reduction Methods

Due to their simplicity and efficiency, feature selection (or reduction) methods have been widely adopted for addressing high-dimensional problems [25]. Feature selection facilitates data comprehension, decreases computational demands, mitigates the curse of dimensionality, and boosts predictor performance [26]. Feature selection fundamentally involves selecting a subset of input variables that effectively represent the data while reducing the influence of noise and irrelevant variables, leading to robust predictive results [26,27].

Analysis of variance (ANOVA) compares means across various groups and is grounded in data variance analysis. This method is frequently applied in feature selection to enhance the processes of inference and decision-making. Earlier studies have incorporated this approach [28].
The feature importance method (FI) is a technique that evaluates and quantifies the significance of features within a machine learning model, assisting users in comprehending the essential role that particular features play in predictive accuracy of the model [29].
Linear correlation coefficient (LCC) is a statistical measure employed to quantify the strength and direction of the linear relationship between two variables, providing valuable insights into how one variable tends to increase or decrease in a consistent manner relative to changes in the other variable [30].
Principal component analysis (PCA) is a linear method for reducing dimensionality by transforming a large number of variables into a smaller subset that still captures most of the significant information from the original dataset. This transformation helps in reducing computational complexity and improving the interpretability of the data [31,32].

The four feature reduction methods discussed earlier were applied to select features for training the machine learning models. The setup of each feature reduction is provided on Table 3.

The outputs of these methods were analyzed to identify the most effective feature reduction strategy for further training. Additionally, resampling techniques were employed to eliminate redundant instances from the dataset.

2.4. Performance Measures

The performance of the selected models was assessed using a performance matrix that compared actual observations with model predictions. This matrix included metrics such as accuracy, precision, recall, and F1-score. Metrics were calculated across various classes: True Positives (TPs) denote the number of correctly classified positive instances, while True Negatives (TNs) indicate the number of accurately classified negative instances. False Positives (FPs) refer to instances incorrectly classified as positive, and False Negatives (FNs) represent instances misclassified as negative [34]. Let N denote the total number of samples, and the evaluation metrics can be represented by the following formulas:

A c c u r a c y = \frac{N_{T P} + N_{T N}}{N},

(1)

P r e c i s i o n = \frac{N_{T P}}{N_{T P} + N_{F P}},

(2)

R e c a l l = \frac{N_{T P}}{N_{T P} + N_{F N}},

(3)

F 1 - s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} .

(4)

In this research, the above four metrics were recorded for each model. The specific results are detailed in Section 3.1 and Section 3.2.

The Combined Bivariate Performance Measure (CBPM) is an innovative metric designed to assess a system or process by integrating two conflicting performance measures [35]. The definition of the CBPM function is as follows:

ξ (x) = φ (x) \cdot ϱ (x),

(5)

where

φ (x) = \frac{f (x) - f_{0}}{Δ f} + ϵ, ρ (x) = \frac{g (x) - g_{0}}{Δ g} + ϵ .

(6)

Our study focuses on accuracy and machine learning training times as the performance measures. The independent variable influences both performance functions,

f (x)

and

g (x)

. Additionally, it is assumed that

f (x)

is a primary performance function which is considered more important than the other function. Let us assume this set X as follows:

X = {x_{0}, x_{1}, \dots, x_{b}}

(7)

Then, the performance functions are defined as follows:

F = {f_{0}, f_{1}, \dots, f_{b}}, G = {g_{0}, g_{1}, \dots, g_{b}},

(8)

and

f_{k} = f (x_{k}), k = 1, \dots, b - 1, f_{0} = \min (F), f_{b} = \max (F)

(9)

Then, we have

g_{k} = g (x_{k}), k = 0, 1, \dots, b .

(10)

From (6) to (9), the set of the discrete CBPM can be defined as follows:

Ξ = {ξ_{0}, ξ_{1}, \dots, ξ_{b}}, ξ_{k} = ξ (x_{k}), k = 0, 1, \dots, b .

(11)

Since the trend of combined performance measures in this research is BIB [35], the optimum

x^{*}

shall be determined as follows from (6) to (11):

x^{*} = \arg \max_{x_{k}} \{x : ξ (x_{k}), x_{k} \in \{x_{0}, \dots, x_{b}\}\} .

(12)

where the trend of the performance ratio function

φ (x)

follows a BIB trend for accuracy, while the trend of the performance ratio function

ρ (x)

exhibits an SIB trend for training time. Consequently, the optimal value of the combined single performance measure

ξ (x^{*})

should be maximized.

3. Performance Evaluations

In this section, we utilize the original dataset from earlier research [8], reconstructing it into separate training and test datasets. The original dataset consisted of 25 h of EEG recordings collected from five participants engaged in a low-intensity control task. Participants were tasked with operating a computer-simulated train using the Microsoft Train Simulator program. Each experiment had the participants controlling the train for 35 to 55 min along a largely featureless route within the specified simulation program [8]. Two EEG datasets contained 26,910 samples in total with 80% of the data used as the training set and 20% as the test set. The random state was 42 and the training time was recorded before standardizing the dataset. We used the test set for testing and for calculating the accuracy, recall, precision, and F1 score with adaptation of the cross-validation. The datasets for the experiments were collected from five different subjects, with data recordings spanning seven days for each subject, except for the last subject, who had only six days of recordings. For each day, the initial 10 min of data were designated for focused attention, while the subsequent 10 min were allocated for unfocused attention. The clear distinction between the two data spaces is attributed to data preprocessing techniques, specifically binning and the application of a sliding time window. Efforts were made to utilize data directly following Butterworth filtering (without STFT) as input for the model, as well as data directly after STFT (without frequency selection, binning, or the sliding time window). However, these methods resulted in poor accuracy, with a maximum of 79.9% achieved by GoogLeNet.

3.1. Result Comparisons for ML Algorithms

The ensuing discussion emphasizes the performance results of the test dataset, which are examined in detail. As illustrated in Table 4, the original performance metrics for six machine learning models are provided. It is significant to note that all models achieved accuracies ranging from

93.59 %

to

99.85 %

. The KNN model was found to excel in both accuracy and training duration. Conversely, the GoogLeNet model was noted for having the longest training time, taking 1 h (=60.08 min) to complete its training.

KNN was selected for the EEG brain attention detection system due to its optimal accuracy and training time. Although the CBPM has been applied for two conflicting performance measures [35], it might be unnecessary in this case, as KNN demonstrates outstanding performance for both conflicting measures: accuracy (BIB) and training time (SIB) [35].

3.2. Result Comparisons for Feature Reduction

Utilizing the accuracy results in the previous section, various feature reduction methods, such as ANOVA, FI, LLC, and PCA, were applied to reduce the number of features in the dataset. These methods were carefully chosen to ensure that only the most relevant features were retained, thereby improving the efficiency and effectiveness of the model training process.

Table 5. Accuracy comparisons of different feature reduction methods.

Model	Original (%)	ANOVA (%)	Feat. Imp. (%)	LCC (%)	PCA (%)
KNN	99.85%	99.78%	99.55%	99.57%	99.85%
GoogLeNet	99.83%	99.72%	99.65%	99.71%	99.78%
ResNet-18	99.78%	99.89%	99.35%	99.41%	99.81%
SVM	99.69%	96.30%	93.55%	93.48%	96.71%
RF	99.23%	99.25%	98.84%	98.74%	98.67%
DT	93.59%	93.61%	93.18%	92.51%	88.27%

The accuracy scores obtained using PCA, which are shown in Table 5, are consistently higher or equal compared to those from the original dataset. The PCA method was removed for our subsequent model selection experiments because the results, which were obtained by adapting a commonly used automotive feature selection technique for PCA [33], were not good (i.e., only one feature was reduced). Additionally, ANOVA was also removed for our selection because of the same reason. Alternatively, the LLC and FI methods had only one or two better accuracy scores. From the data presented in Table 4 and Table 5, it can be seen that the ResNet-18 model improved in accuracy when using the ANOVA method, while the other ML models saw a decrease in accuracy, which remained acceptable and showed no significant differences from the hypothesis test. The training times for most models were shorter than their original durations. It is important to consider that feature reduction techniques like FI and LLC might still be valuable due to other performance metrics. For example, LLC results in a minor drop in accuracy (yet remains acceptable compared to the original dataset), and it necessitates only around 43% of the original data size. The FI feature reduction technique requires an even smaller data size, utilizing just around one third of the original data size (see Figure 3). The original dataset had 252 features before feature reduction. After using the FI feature reduction method, 82 features were chosen, and 67.5% of the total features were removed. As illustrated in the figure, even with only 32.5% of the original EEG features, the model was trained to achieve accuracy similar to that obtained with the initial 252 features. These results emphasize the potential success of the aggressive feature reduction method in maintaining the required accuracy performance.

4. Robust Optimization for Enhanced EEG Brain Attention Detection System Design

Based on our performance evaluation, the enhanced EEG brain attention detection system could be optimized, as illustrated in Figure 4. The figure illustrates a multi-stage process for the enhanced ML-based binary attention detection system.

An optimized EEG brain attention detection system can be developed by combining appropriate preprocessing methods (such as the Butterworth filter and SFTF) with the KNN machine learning algorithm and FI for feature reduction. The first stage is an input layer with seven EEG channels. The next stage is a preprocessing step involving Butterworth filtering and STFT. This is followed by a KNN model, which is depicted as a brain icon. The final stage is binary attention detection, with two states labeled “Not-Focus (0)” and “Focus (1)”. The visual elements represent a system diagram for a framework or pipeline for analyzing EEG data to detect attentional states, potentially for applications like brain–computer interfaces or cognitive monitoring.

Performance Results for Optimized EEG Brain Attention Detection

A receiver operating characteristic (ROC) curve functions as a graphical tool that demonstrates the performance of a binary classifier model at various threshold levels [36]. The Area Under the ROC Curve (AUC) is a vital metric for evaluating the performance of any classification model, as it assesses effectiveness across different threshold settings in a classification context. In (a) of Figure 5, the AUC performance for various ML algorithms with the optimized feature reduction method is illustrated. The ROC curve determines the accuracy of the ML model in distinguishing between two categories, and the AUC represents the total two-dimensional area beneath the ROC curve. It is clear that the KNN algorithm achieves the highest AUC value for the enhanced EEG brain attention detection system.

Regarding accuracy, the enhanced EEG brain attention detection system shows the following performance metrics, as indicated by the confusion matrix in (b) of Figure 5: overall accuracy of 99.56% (=5358/5382), recall of 99.44% (=2683/2698), and precision of 99.67% (=2683/2692). This study seeks to develop efficient EEG-based systems utilizing ML with potential applications in fitness-for-work assessments within a wide range of security sectors [37]. Recent findings have emerged from the adaptation of the dataset utilized in this research [8]. These methods have demonstrated notable improvements in accuracy compared to the original approach. Table 6 presents the results of previous works, detailing the applied machine learning models, the classification tasks selected, and their corresponding accuracy. Additionally, the table includes our results for comparative analysis. The performance of an ML model was observed to be high regardless of whether feature reduction was implemented. Nevertheless, feature reduction remains significantly important in practical applications. It not only minimizes the complexity of the data and the storage space requirements but also reduces the computational burden, facilitating efficient operation on devices with limited storage and processing capabilities, such as smart earphones and smart bands commonly used for EEG data collection. Even after feature reduction, the accuracy remained comparable to that of the non-reduced data. Thus, feature reduction is deemed valuable in practical applications.

5. Conclusions

EEG brain attention systems present several benefits compared to traditional systems. Nevertheless, the use of EEG data varies based on the specific application, causing the nature of the system to differ from other cases. For our enhanced EEG-based brain attention detection system, a machine learning approach using various metrics, including a confusion matrix and an AUC graph, was visualized. The EEG samples from seven channels were employed to train the models, and the KNN algorithm with the FI method yielded the best results in terms of accuracy and training speed. By using the optimized parameters of KNN, the proposed brain attention detection system can achieve an accuracy of up to 99.56%. Beyond accuracy, our experiments have shown that the KNN model, with proper preprocessing and feature reduction, attains an AUC of 100% and an F1 score of 0.996. This indicates that the newly proposed preprocessing method improves upon previous studies. It is noted that that the ablation studies of the base blocks in ResNet were not fully explored in this research. However, future studies shall consider related research for comparison with the original ResNet. This research aims to develop efficient EEG-based systems using machine learning, which could potentially be applied to fitness-for-work assessments across various domains.

Author Contributions

Conceptualization, S.-K.K.; methodology, S.-K.K.; software, J.W.; data reshaping, J.W.; writing—original draft, S.-K.K. and J.W.; writing—review and editing, S.-K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Macao Polytechnic University (MPU), under Grant RP/FCA-05/2024.

Institutional Review Board Statement

Not applicable, using publicly available dataset.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used for the current study are available in the Kaggle [8] (https://www.kaggle.com/, accessed on 30 December 2024).

Acknowledgments

This paper was revised by using AI/ML-assisted tools. The corresponding source codes are publicly available on GitHub (https://github.com/gennwolf/EEGAttention, accessed on 5 January 2025) for users to perform demonstrations so that they can fully understand the algorithms in this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Schomer, D.L.; Lopes da Silva, F.H.; Amzica, F.; Lopes da Silva, F.H. 20C2Cellular Substrates of Brain Rhythms. In Niedermeyer’s Electroencephalography: Basic Principles, Clinical Applications, and Related Fields; Oxford University Press: Oxford, UK, 2017; pp. 20–62. [Google Scholar]
Goldman, L.; Schafer, A.I. Cecil Medicine; Elservier: Amsterdam, The Netherlands, 2011. [Google Scholar]
Deschamps, A.; Ben Abdallah, A.; Jacobsohn, E.; Saha, T.; Djaiani, G.; El-Gabalawy, R.; Overbeek, C.; Palermo, J.; Courbe, A.; Cloutier, I.; et al. Electroencephalography-Guided Anesthesia and Delirium in Older Adults After Cardiac Surgery: The ENGAGES-Canada Randomized Clinical Trial. JAMA 2024, 332, 112–123. [Google Scholar] [CrossRef] [PubMed]
Orovas, C.; Sapounidis, T.; Volioti, C.; Keramopoulos, E. EEG in Education: A Scoping Review of Hardware, Software, and Methodological Aspects. Sensors 2025, 25, 182. [Google Scholar] [CrossRef]
Dhande, S.; Kamble, A.; Gundewar, S.; Naresh Babu, N.; Kumar, P. Overview of Machine Learning for Bioengineering EEG Signal Processing. In Proceedings of the 2024 Second International Conference on Intelligent Cyber Physical Systems and Internet of Things (ICoICI), Coimbatore, India, 20–30 August 2024; pp. 611–616. [Google Scholar]
Li, M.; Qiu, M.; Kong, W.; Zhu, L.; Ding, Y. Fusion Graph Representation of EEG for Emotion Recognition. Sensors 2023, 23, 1404. [Google Scholar] [CrossRef] [PubMed]
Atilla, F.; Alimardani, M. EEG-based Classification of Drivers Attention using Convolutional Neural Network. In Proceedings of the 2021 IEEE 2nd International Conference on Human-Machine Systems (ICHMS), Magdeburg, Germany, 8–10 September 2021; pp. 1–4. [Google Scholar]
Aci, C.I.; Kaya, M.; Mishchenko, Y. Distinguishing mental attention states of humans via an EEG-based passive BCI using machine learning methods. Expert Syst. Appl. 2019, 134, 153–166. [Google Scholar] [CrossRef]
Pei, Y.; Luo, Z.; Yan, Y.; Yan, H.; Jiang, J.; Li, W.; Xie, L.; Yin, E. Data Augmentation: Using Channel-Level Recombination to Improve Classification Performance for Motor Imagery EEG. Front. Hum. Neurosci. 2021, 15, 645952. [Google Scholar] [CrossRef]
Herbert, H. Jasper, M. Report of the committee on methods of clinical examination in electroencephalography: 1957. Electroencephalogr. Clin. Neurophysiol. 1958, 10, 370–375. [Google Scholar]
Butterworth, S. On the Theory of Filter Amplifiers. Exp. Wirel. Wirel. Eng. 1930, 7, 536–541. [Google Scholar]
Goyal, D.; Pabla, B. Condition based maintenance of machine tools—A review. CIRP J. Manuf. Sci. Technol. 2015, 10, 24–35. [Google Scholar] [CrossRef]
Lieber, C.; Mahadevan-Jansen, A. Automated Method for Subtraction of Fluorescence from Biological Raman Spectra. Appl. Spectrosc. 2003, 57, 1363–1367. [Google Scholar] [CrossRef]
Alarfaj, F.K.; Malik, I.; Khan, H.U.; Almusallam, N.; Ramzan, M.; Ahmed, M. Credit Card Fraud Detection Using State-of-the-Art Machine Learning and Deep Learning Algorithms. IEEE Access 2022, 10, 39700–39715. [Google Scholar] [CrossRef]
Zhao, J.; Lui, H.; McLean, D.; Zeng, H. Automated Autofluorescence Background Subtraction Algorithm for Biomedical Raman Spectroscopy. Appl. Spectrosc. 2007, 61, 1225–1232. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. arXiv 2014, arXiv:1409.4842. [Google Scholar]
Studer, M.; Ritschard, G.; Gabadinho, A.; Müller, N.S. Discrepancy Analysis of State Sequences. Sociol. Methods Res. 2011, 40, 471–510. [Google Scholar] [CrossRef]
Randhawa, K.; Loo, C.K.; Seera, M.; Lim, C.P.; Nandi, A.K. Credit Card Fraud Detection Using AdaBoost and Majority Voting. IEEE Access 2018, 6, 14277–14284. [Google Scholar] [CrossRef]
Oded Maimon, L.R. Data Mining and Knowledge Discovery Handbook; Springer: New York, NY, USA, 2010. [Google Scholar]
Taghizadeh-Mehrjardi, R.; Nabiollahi, K.; Minasny, B.; Triantafilis, J. Comparing data mining classifiers to predict spatial distribution of USDA-family soil groups in Baneh region, Iran. Geoderma 2015, 253–254, 67–77. [Google Scholar] [CrossRef]
Mao, A.; Mohri, M.; Zhong, Y. Cross-Entropy Loss Functions: Theoretical Analysis and Applications. arXiv 2023, arXiv:cs.LG/2304.07288. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:cs.LG/1412.6980. [Google Scholar]
Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
Akogul, S. A Novel Approach to Increase the Efficiency of Filter-Based Feature Selection Methods in High-Dimensional Datasets With Strong Correlation Structure. IEEE Access 2023, 11, 115025–115032. [Google Scholar] [CrossRef]
Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
Chuang, L.Y.; Chang, H.W.; Tu, C.J.; Yang, C.H. Improved binary PSO for feature selection using gene expression data. Comput. Biol. Chem. 2008, 32, 29–38. [Google Scholar] [CrossRef]
Wu, C.; Yan, Y.; Cao, Q.; Fei, F.; Yang, D.; Lu, X.; Xu, B.; Zeng, H.; Song, A. sEMG Measurement Position and Feature Optimization Strategy for Gesture Recognition Based on ANOVA and Neural Networks. IEEE Access 2020, 8, 56290–56299. [Google Scholar] [CrossRef]
Casalicchio, G.; Molnar, C.; Bischl, B. Visualizing the Feature Importance for Black Box Models. In Proceedings of the Machine Learning and Knowledge Discovery in Databases; Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G., Eds.; Springer: Cham, Switzerland, 2019; pp. 655–670. [Google Scholar]
Biesiada, J.; Duch, W.l. Feature Selection for High-Dimensional Data—A Pearson Redundancy Based Filter; Springer: Berlin/Heidelberg, Germany, 2007; pp. 242–249. [Google Scholar]
Evgeniou, T.; Pontil, M. Support Vector Machines: Theory and Applications. In Machine Learning and Its Applications: Advanced Lectures; Springer: Berlin/Heidelberg, Germany, 2001; pp. 249–257. [Google Scholar]
Jolliffe, I.T.; Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef] [PubMed]
Minka, T. Automatic Choice of Dimensionality for PCA; Technical Report 514; MIT Media Lab Vision and Modeling Group: Cambridge, MA, USA, 2001; pp. 577–583. [Google Scholar]
Feng, X.; Kim, S.K. Novel Machine Learning Based Credit Card Fraud Detection Systems. Mathematics 2024, 12, 1869. [Google Scholar] [CrossRef]
Kim, S.K. Combined Bivariate Performance Measure. IEEE Trans. Instrum. Meas. 2024, 73, 1009404. [Google Scholar] [CrossRef]
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
Serra, C.; Rodriguez, M.C.; Delclos, G.L.; Plana, M.; López, L.I.G.; Benavides, F.G. Criteria and methods used for the assessment of fitness for work: A systematic review. JAMA 2007, 64, 304–312. [Google Scholar] [CrossRef]
Zhang, D.; Cao, D.; Chen, H. Deep learning decoding of mental state in non-invasive brain computer interface. In AIIPCC ’19, Proceedings of the International Conference on Artificial Intelligence, Information Processing and Cloud Computing, Sanya China, 26–28 July 2019; Association for Computing Machinery: New York, NY, USA, 2019. [Google Scholar]
Sravanth Kumar, R.; Srinivas, K.K.; Peddi, A.; Harsha Vardhini, P. Artificial Intelligence based Human Attention Detection through Brain Computer Interface for Health Care Monitoring. In Proceedings of the 2021 IEEE International Conference on Biomedical Engineering, Computer and Information Technology for Health (BECITHCON), Dhaka, Bangladesh, 4–5 December 2021; pp. 42–45. [Google Scholar]
Al-Nafjan, A.; Aldayel, M. Predict Students’ Attention in Online Learning Using EEG Data. Sustainability 2022, 14, 6553. [Google Scholar] [CrossRef]
Khare, S.K.; Bajaj, V.; Sengur, A.; Sinha, G. 10—Classification of mental states from rational dilation wavelet transform and bagged tree classifier using EEG signals. In Artificial Intelligence-Based Brain-Computer Interface; Bajaj, V., Sinha, G., Eds.; Academic Press: Cambridge, MA, USA, 2022; pp. 217–235. [Google Scholar]
Suwida, K.; Hidayati, S.C.; Sarno, R. Application of Machine Learning Algorithm for Mental State Attention Classification Based on Electroencephalogram Signals. In Proceedings of the 2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE), Jakarta, Indonesia, 16 February 2023; pp. 354–358. [Google Scholar]
Wang, Y.; Nahon, R.; Tartaglione, E.; Mozharovskyi, P.; Nguyen, V.T. Optimized preprocessing and Tiny ML for Attention State Classification. In Proceedings of the 2023 IEEE Statistical Signal Processing Workshop (SSP), Hanoi, Vietnam, 2–5 July 2023; pp. 695–699. [Google Scholar]
Khare, S.K.; Bajaj, V.; Gaikwad, N.B.; Sinha, G.R. Ensemble Wavelet Decomposition-Based Detection of Mental States Using Electroencephalography Signals. Sensors 2023, 23, 7860. [Google Scholar] [CrossRef] [PubMed]
Velaga, N.; Singh, D. The Potential of 1D-CNN for EEG Mental Attention State Detection. Commun. Comput. Inf. Sci. (CCIS) 2024, 2128, 173–185. [Google Scholar]

Figure 1. Preprocessing for EEG signal detection. (a) Butterworth filter. (b) Example of frequency domain transform using STFT.

Figure 2. Preprocessed data of a single bin for comparison between two states.

Figure 3. Proportion of feature reduction using different feature reduction techniques.

Figure 4. Enhanced EEG attention detection system.

Figure 5. The results of the optimized EEG brain attention detection system: (a) the AUC performance of various ML algorithms with the feature reduction; (b) a confusion matrix of the optimized system.

Table 1. Hyperparameter setup for ResNet-18 and GoogLeNet.

Hyperparameter	Value
Loss Function	Cross Entropy [22]
Optimizer	Adam [23]
Learning Rate Scheduler	Cosine Annealing [24]
Batch Size	32
Initial Learning Rate	0.001
Epochs	50

Table 2. Hyperparameter setup for other models.

Hyperparameter	Value
Decision Tree	criterion = ‘entropy’
Random Forest	criterion = ‘entropy’
KNN	n neighbors = 5
SVM	kernel = ‘rbf’, probability = True

Table 3. Condition setup for feature reduction methods.

Method	Feature Reduction Setup
ANOVA	p-values $\leq 0.05$
Feature Importance	importance $\geq E$ [importance of all features]
LCC	$a b s (r) \geq E [a b s (r of all features)]$
PCA	n-components = mle [33]

Table 4. Performance results of various ML models for EEG-based attention detection.

Model	Accuracy	Precision	Recall	F1-Score	Train [Min]
KNN	99.85%	99.81%	99.89%	0.999	≈0.00
GoogLeNet	99.83%	99.89%	99.78%	0.998	60.08
ResNet-18	99.78%	99.70%	99.85%	0.998	48.70
SVM	96.69%	96.43%	96.98%	0.967	0.150
RF	99.24%	99.26%	99.22%	0.992	0.586
DT	93.59%	93.46%	93.77%	0.936	0.131

Table 6. Comparison of previous studies using the same dataset.

Previous Research	ML Model	States	Accuracy (%)
C. I. Aci (2019) [8]	SVM	focus, unfocus, drowsy	91.72
D. Zhang (2019) [38]	CNN	focus, unfocus, drowsy	96.40
R. Sravanth Kumar (2021) [39]	KNN	focus, unfocus	97.50
A. Al-Nafjan (2022) [40]	Random Forest	focus, unfocus	96.00
S. K. Khare (2022) [41]	Bagged Tree	focus, unfocus, drowsy	91.77
K. Suwida (2023) [42]	XGBoost	focus, unfocus, drowsy	98.00
Y. Wang (2023) [43]	SVM	focus, unfocus, drowsy	99.80
S. K. Khare (2023) [44]	Optimizable Ensemble	focus, unfocus, drowsy	97.80
N. Velaga (2024) [45]	CNN	focus, unfocus, drowsy	98.47
Our Solution (2024)	KNN	focus, unfocus	99.56

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, J.; Kim, S.-K. Novel Machine Learning-Based Brain Attention Detection Systems. Information 2025, 16, 25. https://doi.org/10.3390/info16010025

AMA Style

Wang J, Kim S-K. Novel Machine Learning-Based Brain Attention Detection Systems. Information. 2025; 16(1):25. https://doi.org/10.3390/info16010025

Chicago/Turabian Style

Wang, Junbo, and Song-Kyoo Kim. 2025. "Novel Machine Learning-Based Brain Attention Detection Systems" Information 16, no. 1: 25. https://doi.org/10.3390/info16010025

APA Style

Wang, J., & Kim, S.-K. (2025). Novel Machine Learning-Based Brain Attention Detection Systems. Information, 16(1), 25. https://doi.org/10.3390/info16010025

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Novel Machine Learning-Based Brain Attention Detection Systems

Abstract

1. Introduction

2. Preliminaries of Enhanced EEG-Based Attention Detection

2.1. Preprocessing for EEG Signal Improvement

2.2. Various Machine Learning Algorithms

2.3. Various Feature Reduction Methods

2.4. Performance Measures

3. Performance Evaluations

3.1. Result Comparisons for ML Algorithms

3.2. Result Comparisons for Feature Reduction

4. Robust Optimization for Enhanced EEG Brain Attention Detection System Design

Performance Results for Optimized EEG Brain Attention Detection

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI