1. Introduction
There is mounting evidence suggesting that AD may be primarily a synaptic disorder [
1] and synaptic abnormalities occur before any clinical symptoms. EEG measures instantaneous excitatory and inhibitory postsynaptic potentials [
2], and thus provides a powerful non-invasive tool to capture synaptic dysfunction underlying very early cognitive changes in AD. The superior temporal resolution of EEG makes it especially advantageous in detecting changes in complex multi-stage cognitive processes such as memory, a key indicator of early AD [
3]. A large number of studies have demonstrated that EEG measures, including event-related potentials (ERPs) and oscillations, are sensitive to subtle brain changes in early AD [
4,
5,
6]. Applying a word repetition paradigm, designed to elicit brain activity related to language and memory processing, our laboratory has identified several ERP/oscillatory measures that reliably distinguish mild cognitive impairment (MCI) and early-stage AD patients from healthy elderly controls [
7,
8,
9,
10,
11,
12]. For example, our ERP studies revealed that the N400 component, sensitive to semantic processing and integration, and the P600 (or ‘Late Positive Component’, LPC), sensitive to explicit verbal memory, can be reliably elicited in healthy elderly but not in MCI or AD patients [
7,
8,
9,
13]. In mild AD, both the N400 and the P600 word repetition effects are diminished [
13], whereas MCI and preclinical AD patients show compromised P600 but relatively preserved N400 effects [
7,
8,
9]. Similarly, our EEG oscillatory analyses revealed a power suppression in the alpha range (9–11 Hz) that is attenuated for repeated relative to new words in healthy elderly [
10]. This alpha word repetition effect is also compromised in amnestic MCI and correlated with verbal memory measures [
10].
A limitation of traditional ERP/oscillatory analyses is that they usually focus on the timing and the magnitude of pre-defined components at the expense of the overall pattern and complexity of EEG data. Some prior works convert EEG signals to visibility graphs (VGs) [
14], which preserve many features of the original EEG signal. Converting resting state EEG signals to VGs allows for discriminative graph features to be discovered [
15] and utilized in high-accuracy neural network based classification (98%) between AD patients and normal elderly [
16].
Other studies which have applied neural networks or other machine learning algorithms to resting state EEG in AD include Morabito et al. [
17], who used convolutional neural networks on 19 channel EEG and achieved a three-class AD/MCI/cognitively normal (CN) classification accuracy of 82% [
17,
18]. Zhao and He [
19] combined deep belief networks with support vector machines on 16 channel EEG signals and achieved 92% accuracy classifying AD vs. CN [
18,
19]. Duan et al. [
20] quantified between-channel connectivity of resting-state EEG signals in MCI and mild AD patients using coherence measures; they used the Resnet-18 model [
21] to classify between MCI and controls, and AD and controls with an average 93% and 98.5% accuracy, respectively.
Despite the promise of the above studies and other machine learning algorithms which have used biomarkers of AD to improve diagnostic accuracy [
22], there are still to date no widely used machine learning algorithms for the clinical diagnosis of AD. Historically, clinical diagnosis of possible and probable AD (generally found to be between 80 and 90% accurate in clinicopathological studies) was based on recognizing the typical cognitive and behavioral symptoms of this dementia and the exclusion of other possible causes of dementia, whereas a “definite AD” diagnosis was only possible via invasive brain measures from a biopsy or autopsy providing histopathological evidence of AD [
23]. Currently, the International Working Group (IWG) recommends that the clinical diagnosis of AD be restricted to those with positive biomarkers together with specific AD phenotypes [
24]. While purely biological definitions of AD (e.g., [
25]) have become more widely used for research purposes in recent years, the IWG considers the present limitations of biomarkers sufficient that they should not be used for the diagnosis of disease in the cognitively unimpaired [
24]. Thus, the “gold standard” for the clinical diagnosis of AD is criteria (e.g., [
23,
26]) which incorporate multiple biomarkers (including markers of amyloid-
(A
) and tau pathology, neuronal injury and neurodegeneration) along with the clinical phenotype. With the rapid emergence of machine learning algorithms into medical research, this could, however, change rapidly in upcoming years [
27].
Our hypothesis is that word repetition tasks, which have been shown sensitive to detect MCI-to-AD conversion and even preclinical AD using ERPs [
7,
8,
11], can also be used to discriminate AD from normal elderly with high accuracy using a VG-based machine learning approach. Compared to resting state EEG, word repetition task signals are expected to yield better discriminative features given that verbal memory impairments are the best predictors of MCI to AD conversion [
28]. Combining these two lines of past work, we converted EEG signals recorded during word repetition experiments to visibility graphs. We operated under the assumption that the ERP components of interest will be preserved after conversion to graphs and features extracted from these graphs will encode the ERP components while reducing variance across subject data for better downstream machine learning classification performance.
Therefore, this work focuses on the analysis of EEG signals and extracting features from them that are useful in discriminating between AD and RNE in a variety of machine learning algorithms. To demonstrate the generalizability of those features, we tested whether they can also effectively discriminate between prodromal Alzheimer’s (pAD, MCI patients who converted to Alzheimer’s Dementia within 3 years) and robust normal elderly (RNE, normal elderly persons who have remained cognitively normal for the duration of follow-up). We apply a similar approach to that of Ahmadlou et al. [
16], although extracting many more features (including many novel ones in this context) from word repetition task EEG signals (instead of resting state as in Ahmadlou et al. [
16]).
In our framework, pictured in
Figure 1, we first collect EEG data from word repetition tasks. We then perform pre-processing of this data and then convert the EEG signals to visibility graphs. From these VGs we extract 12 features and perform statistical tests for feature selection, keeping the discovered statistically significant predictors as inputs for machine learning algorithms. Finally, the dimensionality of this feature space is reduced with principal component analysis and we use the resulting reduced feature space as inputs to machine learning algorithms.
In summary, the intended contributions of this work are threefold:
We demonstrate the effectiveness of EEG analysis on word repetition tasks for dementia classification (AD vs. RNE) across various machine learning algorithms (support vector machines, logistic regression, linear discriminant analysis, neural networks);
We select a new set of high performing features under a framework for EEG visibility graph analysis that, when combined with existing features from the literature, detect even earlier stage AD (i.e., discriminates pAD vs. RNE);
We open source our code so that it can be adapted for other datasets and tasks (e.g., resting state EEG data or discriminating other types of dementia).
6. Discussion and Conclusions
This EEG/ERP word repetition paradigm has been shown to be sensitive to MCI and the conversion from MCI to AD [
7,
8,
10]. The recent development of VGs for EEG allows for a more holistic measure of EEG time series using graph features. By combining VGs with the EEG word repetition paradigm, we are able to discriminate AD from RNE with a perfect accuracy of 100% using linear classifiers and generalize these same features for pAD vs. RNE classification with an accuracy of 92.5%—on par with previous work directly comparing pAD and RNE [
60,
61,
62,
63,
64,
65,
66]. Our analysis demonstrates the effectiveness of looking at word repetition EEG tasks for the features we selected for this visibility graph approach.
A number of graph features including GIC, global efficiency, clustering coefficient, small-worldness and local efficiency were already confirmed to be significant in some band–electrode combinations in resting state EEG VG studies comparing Alzheimer’s to RNE [
15,
16]. Our results extend these findings by showing that these features also discriminate AD from RNE using a word repetition task EEG paradigm. Novel features introduced in this paper have been shown to encode more differences between AD and RNE in word repetition trials. To minimize type I error, we utilized PCA to reduce the number of input metrics used for classification. Both novel and previously studied features appeared in the top two components of our PCA loading table (
Table 4). The most common features were global efficiency, density, TSP cost and GIC. Two of these features, namely TSP and density, are from the six novel ones we introduced. The presence of these features generally points to a difference in EEG time series structure between groups, especially with regards to voltage differences and overall structure differences in the waveforms. We note that min cut size, max clique size and independence number also appear in
Table 4, indicating that five out of six of the novel features we introduced are important for prediction.
Learned graph features, representing group differences in the morphology of EEG time series, may reflect AD pathological changes in the neural generators of ERPs, including N400 and P600. Putative N400 generators have been found in the anterior fusiform gyri and other temporal cortical regions [
67,
68]. The primary neural generators of the P600 word repetition effect were localized by functional MRI to the hippocampus, parahippocampal gyrus, cingulate, left inferior parietal cortex and inferior frontal gyrus [
69,
70]. Extended synaptic failure in these regions due to AD pathology may account for the N400 and P600 abnormalities in AD and prodromal AD patients. For example, abnormal memory-related P600 may be associated with tau load in the medial temporal lobe (MTL), including the hippocampus, entorhinal and perirhinal cortices, based on the evidence that early tau accumulation in these regions correlates with lower memory performance and reductions in functional connectivity between the MTL and cortical memory systems [
71].
Using raw and bandpass filtered EEG data, we find that the
band produced the largest number of features, closely followed by the
band. Neural oscillations in different frequency bands are thought to carry different spatial and temporal dimensions of brain integration. Spatially, slow oscillations integrate large neural networks whereas fast oscillations synchronize local networks [
72]. Temporally, slow neural fluctuations are related to the accumulation of information over long timescales across higher order cortical regions [
73]. In line with these hypotheses, empirical evidence has indicated that slow oscillations in the delta range are important for higher cognitive functions that require large-scale information integration (see Güntekin and Başar [
74] for a review). Delta activity has been shown to play important roles in language comprehension such as chunking words into meaningful syntactic phrases [
75]. Slow wave activity (SWA) also facilitates memory consolidation during sleep by orchestrating fast oscillations across multiple brain regions [
76]. It may therefore be hypothesized that cognitive impairments in AD are related to alterations in slow oscillatory activity. Accumulating evidence has supported this hypothesis, showing that decreased delta responses following cognitive stimulation may serve as a general electrophysiological marker of cognitive dysfunction including MCI and AD [
74]. The present findings add to this line of research showing that the patterns of slow EEG fluctuations, as characterized by VG features, reflect neural/cognitive abnormalities in AD. Specific to this word repetition paradigm, Xia et al. [
77] has shown that the vast majority of the memory-related P600 word repetition effect is mediated by slow oscillations in the delta band. Modulation of alpha band power, in comparison, is associated with semantic processing of congruous and incongruous words. Alpha suppression was found to be greater for New than for Old words [
10]. The P600 (delta activity) and alpha suppression effects reflect different aspects of verbal memory processing, and each uniquely contributes to predicting individual verbal memory performance [
77].
An interesting finding in the present study is that the Old Congruous condition (words that are semantically congruous to the preceding category statements on repeated trials) produces the highest number of features. Our previous ERP studies and many behavioral studies have shown that old words are processed very differently from new words in normal elderly, due to their intact memory function, but much less so in AD patients. EEG channels producing the highest number of features were Fz, F8, R41, Pz, Br, Wl and O1. In the PCA comparision in
Figure 4 and
Table 4, we see this trend continue across even the different comparisons (all classes, pAD vs. RNE, AD vs. RNE). Several of these channels are known to be sensitive to word repetition and congruity manipulations in pAD patients. For example, the N400 brain potential usually becomes smaller when an incongruous word is repeated, i.e., the N400 repetition effect, and the effect is typically largest over midline and right posterior channels including Cz, Pz, Wr, R41 and T6 [
7,
8,
13]. The P600 ERP usually becomes smaller when a congruous word is repeated, i.e., the P600 congruous repetition effect, and the effect is widespread and largest over the midline channels with a peak typically near Pz [
7,
8,
13]. These ERP repetition effects are consistently found to be reduced or abnormal in MCI patients [
7,
8], and severely diminished in AD patients [
13] compared to RNE, although they still appear in our comparison. The consistency across studies in channel locations where group differences were found suggests that the VG features may capture the underlying brain mechanisms related to the ERP repetition effects.
We now list strengths and limitations of our study. One of the strengths of our study is our 100% accuracy with all classifiers on AD vs. RNE which demonstrates the effectiveness of the features our method extracts. Linear separability after PCA implies that, even before dimension reduction, AD vs. RNE is still a linearly separable comparison; indeed,
Figure 4 explicitly demonstrates this. Additionally, classification accuracy of 92.5% on pAD vs. RNE with non-linear neural networks and similar accuracies with linear classifiers using
only the features extracted from AD vs. RNE highlights how these generalizable features alone may be sufficient for high-accuracy, near-linear classification of these two groups that remains competitive with other EEG-based published work which explicitly extract features for pAD vs. RNE classification [
60,
61,
62,
63,
64,
65,
66]. This strength also likely comes from looking at word repetition EEG tasks which have been shown to be sensitive to detecting MCI-to-AD conversion and preclinical AD using ERPs [
7,
8,
11]. Furthermore, our code is open source and linked in the paper so that future work can build upon our strong results and apply it to other datasets and tasks.
A potential limitation of the present study is the down-sampling procedure used for data reduction. Averaging EEG data across non-overlapping 80 ms time windows is effectively similar to lowpass filtering the data to 12.5 Hz, which would have reduced the amount of information in higher frequencies including beta band and above. This procedure most likely limited our ability to find discriminative VG features in these higher frequency bands. It is also worth noting that, in the present study, we used EEG time series averaged across trials for VG conversion. Cross-trial averaging is commonly used in ERP analyses to increase the signal-to-noise ratio in EEG data and extract activity that is evoked by, and phase-locked to, experimental stimuli. This averaging procedure, although highly effective as demonstrated in the present study, ignores EEG activity that is related, but not phase-locked, to the stimuli. With greater computing power, it would be valuable for future studies to identify discriminative VG features from higher frequency bands and non-phase-locked activity.
Another limitation is the small sample size used in our classification tests, feature extraction and statistical analysis (15 AD, 15 pAD, 11 RNE). We mitigate this issue in two ways: (1) we report classification scores as an average of 100 trials of training on 85% of the data and testing on 15% (and only reporting the testing accuracy), and (2) we verify the feature extraction step by only using AD vs. RNE features to classify pAD patients, demonstrating generalization of those features. Despite this, further replication of these results on larger datasets would be beneficial to the field. In such studies, it could be useful to perform data augmentation, reduce model bias by imposing some penalties during training (e.g., weight decay, dropout, etc.) or try different network architectures (such as graph neural networks) to achieve even better generalization results. An additional limitation is that we did not require amyloid biomarker studies in the definition of our clinically defined subject groups, who were well-characterized by expert clinicians and longitudinal cognitive testing.
In summary, this paper extends the results of prior studies on the use of visibility graphs for finding distinguishing features between and classifying Alzheimer’s and RNE groups [
15,
16] to word repetition tasks on both AD and pAD with a novel set of features. Distinguishing between pAD and RNE groups has historically produced poorer classification accuracy in the literature; however, this paper provides novel features for this type of classification that discriminates between pAD and RNE with competitive accuracy on our dataset (92.5%) simply by generalizing AD vs. RNE features. Although we achieve perfect 100% accuracy on the AD vs. RNE task and demonstrate its generalization, a larger study with a much larger sample size is still required to verify the efficacy of our framework. Because all of the code is open source, this experiment can be readily applied to much larger datasets; future applications could include predictors of conversion in MCI and discriminate between different dementia pathologies. In future work, we plan to apply our framework to larger AD and MCI datasets, and also to test similar frameworks in preclinical AD.