1. Introduction
Alzheimer’s disease (AD) systematically destroys brain neurons over time [
1]. This neurodegenerative disorder progressively leads to cognitive decline, notably in brain regions associated with memory. AD arises from various factors, including environmental influences, vascular diseases, head injuries, genetic predispositions, and, particularly, ageing. More than 50 million people are diagnosed with AD around the globe [
2], with this type of disorder significantly contributing to elderly disability and dependency and defining the seventh most crucial cause of death. Similarly, frontotemporal dementia (FTD) is a neurodegenerative disorder that leads to issues associated with communication challenges and behavioural changes. Diagnosing these disorders progresses through several stages: an asymptomatic early pre-clinical phase, a period of mild cognitive impairment, and finally, dementia [
3]. As a consequence, diagnoses at an early stage are crucial. A diagnosis can be achieved by utilising physical exams, cerebrospinal fluid tests, cognitive and language tests, and urine and blood tests. Brain scans can also be adopted, such as Computed or Positron Emission Tomographies (CT/PET) and Magnetic Resonance Imaging (MRI) techniques [
3]. While brain scans offer detailed spatial resolution, they lack the temporal precision to capture dementia’s evolving symptoms. Electroencephalography (EEG), with its superior temporal resolution, can detect subtle brain activities essential for understanding the dynamic neural interactions in dementia. Moreover, EEG’s cost-effectiveness, accessibility, and ability to provide real-time brain activity monitoring make it suitable for broad screening and diagnostics. However, the volume of recorded EEG data and its inherent artefacts often pose difficulty in identifying critical biomarkers and, thus, accurately diagnosing neurodegenerative disorders. Different research approaches have appeared over the years to mitigate these technical issues. Moreover, there is an abundance of EEG denoising pipelines present in the literature, as previous studies have applied various techniques for extracting high-level features from EEG data, such as wavelet transforms [
4], fractal dimensions [
5], entropy-based features [
6], and the Hurst exponent [
7]. Similar techniques have been used for extracting features for detecting and diagnosing Parkinson’s [
8] epilepsy [
9,
10], schizophrenia [
11], and other neurological disorders. However, there is limited research on comparing and assessing the utility of such feature extraction techniques for discriminating between Alzheimer’s and frontotemporal patients, as well as from healthy controls (HCs).
This study’s objectives are as follows:
To evaluate the effectiveness of sliding windowing, feature extraction techniques, machine learning models, and their pipelines to detect and discriminate AD, FTD, and HC biomarkers.
To identify brain regions affected by AD and FTD and verify if these regions align with standard medical tests.
Our Research Question (RQ):
RQ: How does the choice of sliding windowing, feature extraction measures, and machine learning models affect the detection and differentiation of AD and FTD biomarkers in EEG data?
We examined 50% and 90% overlaps for sliding windowing, multiple feature extraction techniques—Higuchi Fractal Dimension, Singular Value Decomposition (SVD) Entropy, Zero Crossing Rate, Detrended Fluctuation Analysis, and Hjorth parameters—to extract salient high-level features from EEG signals and supervised machine learning techniques—K-Nearest Neighbors (KNN), Random Forest (RF), XGBoost, and Extra Trees (ET)—to discriminate frontotemporal dementia, Alzheimer’s disease, and control groups.
The remainder of this paper includes a description of related work for AD and FTD detection (
Section 2), and the introduction of a comparative design study for feature extraction from EEG data employing machine learning classification techniques (
Section 3). Findings are subsequently presented and critically discussed in
Section 4. The contribution to the body of knowledge is explicated in
Section 5 by synthesising this research and delineating future avenues of work.
2. Related Work
Several investigations have been conducted for diagnosing Alzheimer’s disorder from biomarkers extracted from electroencephalographic data. Few researchers have employed Hjorth parameters, which are specific statistical properties of EEG data [
12]. Others have employed entropy-based features [
13], standard measures adopted within biomedicine that represent the degree of disorder of an EEG signal. Various research studies have been based upon the computation of EEG source localisations and the extraction of connectivity features of the cortical region [
14] for identifying AD-induced brain network disruptions. Various feature extraction techniques from EEG data exist, and the resulting high-level features are often aggregated and analysed using machine learning. A recent study focused on classifying EEG data from a large dataset of 890 subjects across three categories: healthy controls, mild cognitive impairment subjects, and patients with Alzheimer’s [
15]. Another study scrutinised standard EEG pre-processing techniques using exploratory analysis and highlighted their importance in identifying early AD indicators [
16]. Further research has also emphasised meticulous pre-processing and feature extraction techniques, employing methods like Kolmogorov Complexity [
17], Discrete Wavelet Transform [
18], and Spectral Entropy.
Parallel efforts in FTD diagnosis are also noteworthy. A study presented an automatic FTD detection technique by employing Independent Component Analysis (ICA) in the pre-processing phase of the EEG data and a Light Gradient Boosting (LGB) for classification, achieving an 80.67% accuracy [
19]. Another study emphasised the discovery of crucial biomarkers in differentiating FTD from other neurodegenerative diseases, focusing on serum and cerebrospinal fluid markers [
20]. Similarly, various other forms of dementia, such as the vascular one and mild cognitive impairment, were contrasted using EEG data [
21]. This approach combined Wavelet Transform in the EEG pre-processing phase jointly with Independent Component Analysis (ICA). Feature extraction techniques, including Spectral, Permutation, and Tsallis Entropy, were used to augment original EEG data. The study employed machine learning to train supervised models using Support Vector Machines and the K-Nearest Neighbours learning algorithms, coupled with neighbourhood-preserving QR-decomposition for dimensionality reduction based on fuzzy logic. Similarly, researchers performed feature selection via the improved binary gravitation search approach, leading to a higher classification accuracy of patient groups [
22].
Machine and deep learning-based applications have been widely adopted for solving supervised AD detection with EEG data analysis [
18,
23,
24,
25]. For example, Convolutional Neural Networks (CNNs) have been trained on functional brain connectivity features to detect AD and other neurological disorders automatically [
26]. Similarly, a feed-forward neural network was trained on DNA methylation and gene expressions after employing dimensionality reduction techniques. Another study used convolutional auto-encoders to classify AD, mild cognitive impairment, and healthy control subjects utilising time–frequency high-level features generated from the application of Continuous Wavelet Transform [
27].
Although deep learning has demonstrated a superior capacity to develop models for automatically learning and integrating salient features from EEG data for an improved classification accuracy [
28], they are often considered difficult to interpret and explain. Studies employing more straightforward learning methods, such as logistic regression, have shown that optimally pre-processed data can still lead to a higher model’s performance, and complex learning strategies are not often helpful [
29]. This suggests that external but transparent and interpretable techniques can often help extract salient features from EEG data better than automated deep learning methods to classify neurodegenerative disorders. Along with the use of more transparent methods, a novel study focused on multimodal EEG and cerebrospinal fluid-related data to distinguish early-onset Alzheimer from FTD subjects by adopting microstates theory and spectral analyses [
30]. In detail, EEG microstates are short time intervals of stable scalp potential fields. This study demonstrates how abnormalities associated with early-onset AD subjects could be detected by analysing the variation in EEG microstate duration and global field power peaks correlating with clinical severity and cerebrospinal fluid biomarkers. Another study extracted statistical features from EEG frequency bands trained with Decision Trees and Random Forests to discriminate Alzheimer’s and frontotemporal dementia subjects. These algorithms are not only more straightforward and interpretable than deep learning algorithms, but they lead to the development of models with remarkable accuracy [
31].
Besides the transparency offered by simpler machine learning algorithms or the capacity of deep learning to deliver highly accurate predictive models, there is the technical problem of extracting salient features from large datasets with a reasonable computational complexity in computer memory and time [
28,
32,
33]. Consequently, external techniques for extracting meaningful high-level features from EEG data are not only helpful for transparency and interpretability but are often required for dimensionality reduction and, thus, for a significant decrement in the computational resources required to train predictive models. In this direction, often, Fast Fourier Transforms [
34], and Discrete Wavelet Transforms [
35] have proven helpful in extracting features from EEG data before training models with machine learning algorithms. Similarly, Multiway Array Decomposition [
36], Principal Dynamic Mode (PDM) analysis [
37], Singular Value Decomposition [
9], and Principal Component Analysis [
38] have all demonstrated utility in such endeavour.
In summary, the body of research on identifying Alzheimer’s disorders and frontotemporal dementia using electroencephalographic data is extensive, along with the adoption of machine and deep learning to develop improved predictive models. However, the problem of evaluating the utility of various ad hoc interpretable techniques to learn salient features and biomarkers from EEG signals associated with Alzheimer’s disorders and frontotemporal dementia, often used as inputs to the aforementioned learning techniques, is elusive.
3. Materials and Methods
This section introduces an empirical work that compares various feature extraction techniques from EEG data for discriminating subjects with Alzheimer’s disorder and frontotemporal dementia from healthy adults.
Figure 1 shows a synthesis of such a novel pipeline that is divided into five phases: (A) a pre-processing pipeline to denoise EEG data and to segment it with a sliding window technique; (B) a phase where various feature extraction techniques for EEG data are contrasted, along with straightforward supervised learning algorithms for classifying frontotemporal dementia subjects from those having Alzheimer’s disorder; (C) a training phase for automatically aggregating the extracted features from the previous step towards predictive models; (D) an evaluation of such models with unseen testing data across various evaluation metrics; (E) an analytical phase for establishing the importance of EEG channels in the discrimination of Alzheimer’s and frontotemporal dementia subjects.
3.1. Dataset
A dataset published in the OpenNeuro repository (ds004504) [
39] was selected. It includes EEG recordings from 88 subjects, of which 23 have frontotemporal dementia (FTD), 36 have Alzheimer’s disease (AD), and 29 are healthy control (HC) subjects. Participants were seated with eyes closed during the recordings (resting state) and were provided with the Mini-Mental State Examination test for cognitive and neurophysical assessment. Recordings were acquired at AHEPA University Hospital in Thessaloniki, Greece. The EEG2100 equipment from the Nihon Kohden Group was used. Nineteen scalp electrodes were used in line with the 10–20 international standard (Fp1, F7, F3, T3, C3, T5, P3, O1, Fz, Cz, Pz, Fp2, F4, F8, C4, T4, P4, T6, O2), with A1 and A2 used as references.
Recordings were sampled at 500 Hz, and the amplifier’s settings were tuned to 10 μV/mm, with a time constant of s and a high-frequency filter of 70 Hz. Recording lengths varied between min for the Alzheimer’s subjects and 12 min for the frontotemporal subjects, with min for the healthy controls. Overall, the dataset had , , and 402 min of recordings, respectively, for AD, FTD, and healthy controls.
3.2. Pre-Processing Phase
The pre-processing phase associated with the EEG signals was initially executed by the researchers who recorded the data [
39]. This included the execution of the Butterworth band-pass filter (0.5–45Hz), the re-referencing of channels to the A1 and A2 electrodes, and artefact elimination using ICA. The EEG data were cleaned from noise using Artefact Subspace Reconstruction, a method present in the EEGLab’s Matlab software. The RunICA algorithm was run to transform the nineteen EEG channels into independent components. Those containing
ocular noise or
jaw artefacts, via visual inspection, were zeroed, and inverse Independent Component Analysis (ICA) was executed. We extended preprocessing by using a sliding window technique to segment EEG data into overlapping 1-second windows, applying two strategies of 50% and 90% overlap, consistent with methods used in similar studies [
40,
41,
42,
43]. The rationale for contrasting two distinct overlapping window strategies was to evaluate the capability to extract time-domain biomarkers from limited EEG data and, thus, the feature extraction’s efficacy in discerning AD and FTD’s key attributes. Note that a 50% overlap strategy is faster than a 90% overlap strategy. Each window containing 500 data points (because of the 500 Hz sampling rate) led to the separate execution of the various selected feature extraction techniques and their different outputs. Each technique yielded 19 columns, the EEG channels, and
N rows, the segmented overlapped EEG windows, for each subject. Subsequently, a concatenation of these individual tables was performed, leading to a final data shape of
N (total windows) × window length (in seconds) × sampling rate (500 Hz) × 36 (number of subjects) columns and 19 rows (number of channels). The dataset was unbalanced across the target feature (class of subjects, AD/FTD, HC); thus, the Synthetic Minority Oversampling Technique (SMOTE) was applied. In detail, oversampling was performed for the 23 subjects with frontotemporal dementia (FTD) and the 29 healthy controls (HC) to match the 36 subjects with Alzheimer’s disease (AD).
3.3. Feature Extraction Techniques
As mentioned above, several techniques for feature extraction from EEG data were considered. These include the Singular Value Decomposition (SVD) Entropy, the Higuchi Dimension (HFD) based on fractal geometry, the Zero-Crossing Rate statistical indicator, the Detrended Fluctuation Analysis (DFA), and the Hjorth indicator. The PyEEG and Antropy Python libraries were used, and each individual technique is concisely detailed in the following parts of the text.
The Singular Value Decomposition (SVD) Entropy indicator is based on time series complexity. It evaluates the necessary number of orthogonal vectors for accurate data representation [
44,
45,
46]. Mathematically,
SVD Entropy correlates with the complexity of the underlying data, where
is the embedded matrix’s normalised singular values.
The Higuchi Fractal Dimension (HFD) technique focuses on the non-parametric time series analysis based on generating new synthetic signals by a systematic procedure that sub-samples from the original data [
47]. Mathematically,
where
k is the interval length, and
m is the initial point. The time series is subsequently utilised to calculate the average curve length
,
where term
y is normalised and denoted as
adheres to a power law, defining the Fractal Dimension
D. HFD’s applicability to non-stationary series differentiates it from methods like Spectral and Hurst Exponents [
48].
The Zero-Crossing Rate (ZCR) represents the rate of the change in the sign of a signal, essentially counting how many transitions of the zero amplitude exist within a time frame [
49,
50]. Mathematically,
where
Z represents the Zero Crossing Rate.
T is the total number of samples in the signal (or in a specified window/frame of the signal for localised analysis).
and are consecutive samples in the signal at times t and , respectively.
is an indicator function that evaluates to 1 if the product of and is less than 0 (indicating a zero crossing, where the sign of the signal changes between two consecutive samples), and 0 otherwise.
The denominator normalises the sum to account for the number of intervals between samples, providing a rate per sample interval.
The calculation starts by initialising a sum that will accumulate the total number of zero crossings. For each pair of consecutive samples and , starting from the second sample up to the last one, the product of these two samples is checked. The indicator function checks if the product of and is negative. This is a mathematical way of determining if the sign of the signal changes between these two samples:
If and have different signs, their product will be negative, indicating a zero crossing. The indicator function then contributes 1 to the sum.
If and have the same sign, their product will be positive (or zero if either sample is zero, depending on how zero values are treated), and the indicator function contributes 0 to the sum.
The sum of all instances where the indicator function equals 1 gives the total number of zero crossings. This sum is then normalised by dividing by , which is the total number of intervals between consecutive samples in the signal or the window under consideration. The result of this division is the Zero Crossing Rate, Z, which gives a normalised measure of how frequently the signal crosses zero, thus providing insight into the signal’s properties, especially its frequency content and texture.
The Detrended Fluctuation Analysis (DFA) focuses on non-stationary time series for persistent patterns and correlations. It involves integrating the series, segmenting, detrending each segment, and calculating the fluctuation magnitude by contrasting the gradients of
and
.
where the detrending step is
.
The Hjorth parameters can be used to gauge insights into the characteristics of a signal, such as its regularity and frequency [
46]. Three main parameters exist, namely, Activity, Mobility, and Complexity. Mathematically:
Activity measures signal variance, mobility indicates signal frequency, and complexity assesses frequency variation. This research employs the average values of complexity and mobility derived from EEG signals.
3.4. Classification
The aforementioned techniques lead to features that are the input of various machine learning algorithms. A preliminary investigation of many learning algorithms was performed with the features extracted using the SVD Entropy technique and a 50% overlapping strategy among EEG windows. A 15-fold cross-validation strategy was employed, given the limited number of participants in the selected datasets. This preliminary training aimed to deliver an initial understanding of the capacity of each learning algorithm to fit the target feature (AD, FTD, HC) and minimise the computational power and time required for developing many models. The performance measures included the accuracy, precision, recall, F1-score averages, and area under the ROC curve (AUC) averages of the 15 surrogate models (
Table 1).
We selected the four top-performing learning techniques based on their preliminary results (
Table 1) to conduct a more focused comparative analysis, integrating sliding window techniques and feature extraction measures. This selection aimed to enhance the efficiency and depth of our model evaluation process by concentrating on the most promising algorithms. The four models are described in detail as follows:
K-Nearest Neighbors (KNN)—This technique can be used for supervised classification and regression, assuming that similar instances of a dataset cluster together. It is non-parametric and uses proximity to make classifications about the clustering of a new data point.
Random Forest (RF)—This technique constructs numerous decision trees during training. On the one hand, a supervised classification determines the most frequent class predicted by the single individual trees. On the other hand, for regression problems, it computes the average of such predictions. It incorporates randomness and includes sampling data with a replacement step to prevent model overfitting for individual tree learning by focusing only on a sub-set of data at each iterative tree’s split.
XGBoost—It is an enhanced and efficient form of Gradient Boosting that integrates regularisation as a form of model complexity control to mitigate overfitting. It also includes system-level enhancements to improve efficiency and flexibility, forming a robust predictive ensemble learning technique.
Extra Trees (ET)—It is an ensemble of decision trees with additional randomness. It not only bootstraps data but also chooses random split points for features. This extra randomness may reduce variance and improve the generalisation of new data.
The above data-driven learning techniques were subsequently trained on the features extracted using the aforementioned techniques (
Table 1). In these circumstances, the dataset was partitioned into two by forming training and test sets (80 and 20% of the original data). To circumvent the risk of data leakage at the subject level, stringent actions were taken to ensure that the training and testing sets were devoid of features extracted from identical subjects. This was meticulously verified by maintaining subject-specific annotations throughout the entirety of the feature extraction process. For instance, consider a scenario where the training set includes subjects 1 to 4. This process ensured that the EEG features associated with these subjects were not in the testing set. Any duplicated features spotted in the testing set were iteratively moved into the training data. It aimed to guarantee the absence of subject data overlap between the sets while maintaining the 80:20 data distribution. Given the limited subjects, the training set was stratified using the 15-fold cross-validation. Through this approach, we divided the training data into 15 folds, using 14 for training and one for validation at each iteration. The top-performing models were selected and further tested on the unbalanced test set (20% of the overall data). Note that these test data were not augmented with SMOTE but left intact, which means following their original nature. The above training mechanism was performed twice: once with the 50% and once with the 90% overlapping strategy among the EEG segmented windows. Three classification tasks were devised: one for discriminating AD patients from healthy controls, one for discriminating FTD patients from healthy controls, and one for discriminating AD from FTD patients.
To further enhance the robustness of the designed multi-phase pipeline across the three classification tasks (
Figure 1), the best-performing configuration (feature extraction technique/learning strategy) was repeated by employing five different seeds. This was aimed at generating different training and testing data five times. The averaged metrics from these five runs shaped the final results. Following the classification phase, we employed topographical brain mapping techniques to improve our predictive models’ interpretability. These maps were instrumental in pinpointing the cerebral regions most crucial for distinguishing between Alzheimer’s disease (AD) patients, frontotemporal dementia (FTD) patients, and healthy controls (HC). We selected the classification model with the highest accuracy and computed a feature importance array for each feature extraction technique, as detailed in
Section 3.3. This array delineated the significance of features derived from each EEG channel in accurately predicting AD and FTD. Subsequently, we visualised these feature importance scores on topographic brain maps, illuminating the brain areas with elevated significance in the classification process.
The primary objective of this analytical step was to derive deeper insights into the specific brain regions that are most influential in the differentiation between AD patients, FTD patients, and healthy controls, thereby enhancing our understanding of the neurophysiological underpinnings of these disorders. This approach not only aids in validating the predictive models but also contributes to the broader field of neuroscientific research by identifying potential biomarkers and neuroanatomical correlates of these neurodegenerative diseases.
3.5. Hyperparameter Tuning
Optimising model hyperparameters was systematically conducted utilising the GridSearchCV module in Python. The aim was to identify the most effective parameter configurations for each machine learning model examined. Summarised below are the optimal settings discovered for each model:
3.6. Model Evaluation
Each trained model was evaluated using different evaluation metrics:
4. Results and Discussion
Table 2 shows the performance metrics (from
Section 3.6) of our models, evaluated using the original, unbalanced dataset. On the one hand, the results demonstrate that Alzheimer’s disease subjects could be discriminated with a superior precision (94–96%) compared to the frontotemporal disease subjects and healthy controls. On the other hand, the discrimination of FTD subjects always had the lowest precision across comparisons (86–88%). Sensitivity scores were always above 90% across the three comparisons, along with the F1-scores.
The observed lower precision in discriminating frontotemporal dementia (FTD) subjects from healthy controls, as compared to discriminating Alzheimer’s disease (AD) subjects, can be attributed to several factors intrinsic to the nature of these neurological conditions and the characteristics of the EEG signals they produce. The discrepancy in outcomes using feature extraction techniques (SVD Entropy, Detrended Fluctuation Analysis, Zero Crossing Rate, Higuchi Fractal Dimensions, Hjorth parameters) and machine learning algorithms (XGBoost, Random Forest, Extra Trees) may stem from the following:
Overlap in EEG signal characteristics: FTD and healthy control EEG signals might share more similar characteristics than those observed between AD and healthy controls. FTD, particularly in its early stages, can manifest subtle EEG changes that are less distinct than those seen in AD, where more pronounced disruptions in brain activity patterns are common. This overlap makes it challenging for the applied feature extraction techniques to capture distinctive features that accurately differentiate FTD from healthy brain activity.
Sensitivity and specificity of features: The feature extraction techniques employed may have differing sensitivities and specificities to the pathological changes in brain activity characteristic of FTD versus AD. For instance, features that are highly sensitive to global cognitive decline and widespread neural network disruption in AD may not be as effective in detecting the more localised or less severe disruptions typical of FTD.
Stage of the disease: The stage of disease at the time of EEG recording could also impact the precision of discrimination. Early-stage FTD may produce very subtle EEG abnormalities that are difficult to distinguish from normal ageing processes, whereas AD-related changes, such as increased slow-wave activity, might be more evident and easier to detect even at earlier stages.
Technical and methodological limitations: The choice of window size for EEG analysis, preprocessing steps, and the specific parameters used in both feature extraction and machine learning algorithms could preferentially favour the detection of AD over FTD. Optimising these methodologies specifically for FTD might require adjustments to better capture the nuanced differences in EEG signals associated with FTD.
Variability within FTD spectrum: FTD encompasses a spectrum of disorders with heterogeneous clinical presentations, including behavioural variant FTD (bvFTD) and primary progressive aphasias. This variability contributes to a wider range of EEG signal manifestations, complicating the task of identifying a consistent set of features that distinguish FTD patients from healthy individuals across all subtypes.
Figure 2 shows the performance of models trained on balanced data with SMOTE in distinguishing Alzheimer’s disease subjects from healthy controls (top row), frontotemporal dementia subjects from healthy controls (middle row), and Alzheimer’s disease versus frontotemporal dementia subjects (bottom row) with the 50% and 90% overlapping windows strategy, over 5 runs. The details of these results are presented in
Table 3,
Table 4,
Table 5,
Table 6,
Table 7 and
Table 8.
While consistent differences in accuracy across learning techniques have not been found, the Singular Value Decomposition Entropy technique seems to help to consistently develop predictive models with the highest accuracy, regardless of the underlying machine learning technique. This can also be observed from
Figure 3 where the average accuracy of each sliding window is presented, grouped across the feature extraction measures. The same thing cannot be said for the other feature extraction techniques, which demonstrated no consistency across machine learning models and data overlapping strategies.
The higher accuracy and F1-score associated with SVD Entropy in this study can be attributed to the following functionalities that it exhibits:
The Singular Value Decomposition (SVD) process breaks down the EEG signal into matrices that ignore the noise and preserve the principal characteristics of the EEG signal.
Furthermore, SVD reduces the dimensionality of the data, abstracting it into a form that retains essential information while discarding redundancy. This abstraction makes it easier for machine learning models to process and learn from the data, enhancing predictive performance.
Additionally, assessing the entropy in the distribution of singular values obtained from SVD quantifies the randomness and complexity of the signal. This is crucial for EEG analysis, where the complexity of brain activity can provide insights into neurological conditions.
Concerning the overlapping strategies, using the 90% overlapping strategy across consecutive EEG windows clearly exhibited utility when discriminating subjects with neurodegenerative disorders (AD and FTD versus HC).
Following the classification phase, we employed a topographical brain mapping technique to improve the interpretability of our predictive models. These maps were instrumental in pinpointing the cerebral regions most crucial for distinguishing AD/FTD patients from HC and AD patients from FTD patients (
Figure 4,
Figure 5,
Figure 6,
Figure 7,
Figure 8 and
Figure 9). We selected the classification model with the highest accuracy (90% overlap, SVD entropy, and a tree classifier) and computed a feature importance array for each feature extraction technique, as detailed in
Section 3.3. This array delineated the significance of features derived from each EEG channel in accurately predicting disease. Subsequently, we visualised these feature importance scores on topographic brain maps, illuminating the brain areas with elevated significance in the classification process. The primary objective of this analytical step was to derive deeper insights into the specific brain regions that are most influential in the differentiation between AD/FTD patients and healthy controls, thereby enhancing our understanding of the neurophysiological underpinnings of Alzheimer’s disease and frontotemporal dementia. This approach not only aids in validating the predictive models but also contributes to the broader field of neuroscientific research by identifying potential biomarkers and neuroanatomical correlates of these neurodegenerative diseases.
Topographic maps indicated the importance of the occipital, frontal, and temporal lobes in distinguishing AD from HC (
Figure 4 and
Figure 5), highlighting specific EEG channels (O2, T5, O1, Fp2, Fp1, F7, F8, T3, T4). Similar regions were critical for differentiating FTD from HC (
Figure 6 and
Figure 7), but with a different order of significance (O2, Fp1, T3, O1, F7, Fp2, F3, T4, T5), with a notable emphasis on the frontal lobe suggesting its effectiveness in capturing FTD features from the frontal region. In contrast to this, when differentiating AD patients from FTD patients (
Figure 8 and
Figure 9), the topographic plots show the frontal and temporal regions, especially channels T3, Fp1, Fp2, F7, F8, T4, F3, Fz, and F4, as being more important than the occipital region. This finding highlights a key distinction from feature importance patterns in AD/FTD vs. HC comparisons, where occipital dominance was observed.
Topographic analyses show the occipital, temporal, and frontal regions’ involvement in distinguishing AD from HC. This aligns with empirical observations about AD’s impact areas. The frontal and temporal regions are primarily involved in differentiating AD from FTD, which is consistent with FTD’s primary impact areas.
While topographic maps highlight the occipital region in distinguishing FTD from HC, this may seem unexpected given FTD’s primary impact on frontal and temporal regions [
51,
52]. However, this aligns with the involvement of the occipital lobe in advanced FTD stages, where it shares degeneration patterns with AD [
53,
54]. This overlap and variability in the FTD presentation underscores the need for accurate differential diagnosis between these diseases [
54].
5. Conclusions
Alzheimer’s disease and frontotemporal dementia, resulting from neuronal damage, impair cognitive functions. Effective denoising and feature extraction from complex, noisy EEG data are essential for their early detection, focusing on dimensionality reduction and key biomarker identification.
Previous research on Alzheimer’s and frontotemporal dementia used limited feature extraction methods without thorough comparison. This study addressed this by evaluating multiple techniques for distinguishing AD and FTD conditions and healthy controls using EEG data. We trained models on features from EEG windows with 50% and 90% overlap, employing classifiers like K-Nearest Neighbors, Random Forest, XGBoost, and Extra Trees. The findings reveal that an increased overlap in EEG windows enhances model accuracy, particularly highlighting SVD entropy’s effectiveness over other techniques. Our model accurately distinguishes AD from FTD, pinpointing critical features in frontal, temporal, and occipital regions. This advances early-stage diagnosis by highlighting distinct EEG patterns specific to each disease.
Future directions include expanding this pipeline’s validation across broader datasets and more diverse subject groups, including AD, FTD, and healthy controls, and extending its utility to diagnose other neurodegenerative diseases like Schizophrenia and Parkinson’s disease. A further investigation is needed to determine the optimal EEG window overlap for effective feature extraction.