1. Introduction
A considerable proportion of elderly people worldwide are affected by AD, the most common form of dementia, which is a widespread neurological illness [
1]. This illness causes a considerable decline in cognitive function, making it impossible for people to live conforming lives. As a result, sufferers require and depend on their loved ones’ support to maintain their functioning [
2]. AD is caused by a combination of genetic and environmental factors, including head injuries and chemical exposure. Memory loss, cognitive decline, communication problems, mood swings, and behavioral changes are just a few of the symptoms that define the illness. It is a disorder that progresses over time, has a pre-clinical stage, and its rates of progression can vary. The prognosis for people with AD is often poor, and the behavioral abnormalities brought on by the illness can make it difficult for patients and their caretakers to operate socially [
3]. Modifications to the structural makeup of neurons are a part of AD’s pathogenesis of AD. The microtubules in neurons act as conduits for the delivery of chemicals and nutrients to axons. TAU protein stabilizes these microtubules. In AD, the TAU protein undergoes chemical alterations, leading to its pairing with other TAU proteins. This process results in the formation of neurofibrillary tangles, causing neuronal collapse, cellular malfunction, and eventually cell death. Additionally, beta-amyloid plaques, known as senile plaques, and cerebrocortical atrophy are present, further hindering efficient information transmission [
4,
5]. The initial regions of the brain implicated in cognitive processes in AD are the hippocampus and medial temporal lobe. These areas show a reduction in neuronal and synaptic density, contributing to cognitive decline. Magnetic resonance imaging (MRI) can visualize the atrophy of the hippocampus and other brain regions associated with memory processing and executive functions [
6].
Figure 1 presents the prominent brain atrophy that is evident in individuals diagnosed with MCI and AD [
7]. However, this phenomenon was not observed in healthy individuals.
In
Figure 1, the dimensions of the hippocampal tissue in MCI subjects, as highlighted with the red arrow, are smaller than those of cognitively normal (CN) subjects and further decrease in AD subjects. The magnitude of the ventricles undergoes a significant transformation, with an increase in size as the disease advances, as depicted with the red stars. Also, the decrease in the amount of gray matter in the cerebral cortex can be observed in magnified images of CN subjects compared to AD subjects.
MRI is a medical imaging modality that enables the production of high-resolution images that can visualize the differences between brain tissues [
8]. Morphometric analysis is the process of obtaining quantitative data by evaluating and analyzing the geometric features of objects by processing MRI images. Through the amalgamation of MRI and morphometric analysis approaches, one can appraise the volume, morphology, and additional geometric characteristics of cerebral regions. In this way, its extensive utilization encompasses the diagnosis of cerebral disorders and the formulation of treatment strategies. Voxel-based morphometry (VBM) is the most widely known morphometric method. VBM calculates the density or concentration of gray matter in a voxel-wise manner [
9]. Other morphometric methods, deformation-based morphometry (DBM) and tensor-based morphometry (TBM) use similar measurement techniques to characterize the differences in brain shape. DBM and TBM images are recorded in a common reference space and analyzed using the parameters of the deformation fields or measurements derived from them [
10]. Surface-based morphometry (SBM) is another morphometric method that is used to analyze the surface properties of the cerebral cortex. The cerebral cortex can be modeled with a spherical model, and the features (thickness, fold depth, and surface area) in this model can be measured statistically [
11]. Among the various methodologies available for neuroimaging analyses and clinical trials, TBM stands out as a highly reliable and objective measure with a significant capacity for high-throughput imaging [
12].
Machine-learning (ML)-based systems have been successfully applied in various fields such as energy [
13], robotics [
14] health [
15], and transportation [
16]. These systems have demonstrated potential for assisting radiologists and physicians in the timely identification and categorization of AD via computer-aided diagnosis (CAD) systems [
17]. Timely diagnosis and accurate analysis of brain atrophy are crucial, and the automated detection of brain atrophy can greatly contribute to these goals. Additionally, it can optimize radiologist efficiency by providing more accurate and efficient results. Various ML techniques, including feature extraction, deep networks, and transfer learning (TL), have been proposed for the classification of AD and MCI [
18,
19,
20,
21]. Deep learning (DL), consisting of artificial neurons, has demonstrated superior performance in handling complex classification tasks compared to traditional ML methods [
22,
23]. CNNs, a specialized DL technique, have been widely utilized for AD diagnosis [
24,
25]. The primary objective of this present research is to devise a highly effective approach for the timely detection of patients in the MCI phase prior to their progression toward the AD stage. Encouraged by the aforementioned findings, we aimed to add to the detection of AD by combining DL and TBM methods.
This study introduced a novel method that utilizes cutting-edge architectures (Xception, VGG16, VGG19, and ResNet-50V2) to accurately detect AD and classify the different stages of dementia (MCI and AD). The proposed model, incorporating the Xception architecture-based deep dense block, was comprehensively evaluated via comparison with modern DL techniques.
The main aim of this article is to develop an effective method for early diagnosis of patients in the MCI stage before they progress to the AD stage. Existing studies on Alzheimer’s diagnosis are limited, mostly focusing on traditional machine-learning-based methods for feature extraction from raw/semi-processed MRI images. In this study, we analyzed processed TBM images statistically. While morphometric images tend to provide better results for disease diagnosis compared to raw/semi-processed MRI images, they are challenging to interpret visually. Therefore, the use of DL-based methods, which enable automatic extraction of disease-specific features from difficult-to-interpret images, will address a significant research gap. The success of DL-based methods can assist physicians in diagnosing diseases using morphometric images. Hence, our goal was to contribute to AD detection by combining DL and TBM methods. Our study’s results demonstrate that the proposed method, based on deep TL and TBM analysis, achieves accurate classification of three different classes and exhibits promising performance. The key highlights of our article include the following:
Because MRI scans are inherently three-dimensional, they can be conceptualized as a stack of 2D MRI slices. From this stack, we selected the most informative slices for classification.
MCI is a transitional stage between AD and CN. Therefore, it is difficult to diagnose. Therefore, in order to classify medical images, we employ the transfer-learning method using models trained on a large dataset.
Transfer learning is used because there is limited data available, and it helps to reduce the costs of the learning process.
Considering that morphometric images tend to yield more successful outcomes in disease diagnosis compared to raw or semi-processed MRI images, DL-based methods are employed for automatic feature extraction in the analysis of TBM images.
The rest of this paper is structured as follows:
Section 2 provides a summary of related studies.
Section 3 presents the CNN architecture and suggested work.
Section 4 discusses the results and experiments, and
Section 5 presents the results of this study.
2. Related Work
AD is a prevalent neurodegenerative disorder, and its early diagnosis is of the utmost importance. Consequently, various models and techniques have been introduced by the research community to facilitate the timely identification of AD. This section presents a review of the diverse deep-learning-based approaches utilized for the identification of AD.
Machine-learning methods, called traditional methods, were the first to appear [
26,
27]. A random forest ensemble classifier with adaptive hyperparameter tuning (HPT-RFE), a novel approach that performs faster than conventional ML algorithms, was employed by Kumari et al. They used MRI, FDG-PET, and PIB PET data from 102 participants to make a binary classification (NC/AD: 100%; NC: 91%; AD/MCI: 95%) of data from the ADNI database [
28]. By transforming the 3D sMRI images from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database into 2D matrices and then subjecting them to a series of processing steps, Gunawardena et al. were able to accurately detect AD at the MCI stage with an SVM-based model and 96% using a deep-learning-based method. The CNN-based strategy outperformed the SVM-based method in detecting AD using sMRI data [
29]. As an alternative to conventional information extraction techniques, Cruz et al.’s architecture, called HerstonNet, uses a 3D Resnet-based neural network regression model to extract significant characteristics from brain morphometric MRI data. By comparing HerstonNet to non-DL techniques, the consistency of the morphometric characteristics increased by 6.09% for volume, 21.73% for thickness, and 43.15% for mean curvature [
30]. In a different study, Savas experimented with various CNN-based pre-trained architectures to categorize 2182 images taken from the ADNI database. The EfficientNetB3 design has the best accuracy, according to the results [
31]. Turkson et al. used MRI scans to pre-train unsupervised convolutional spiking neural networks for binary classification. They subsequently created a pipeline architecture using the SNN output as a feed for a supervised deep CNN to perform AH/MCI/NC classification tasks. They compared the classification results produced using only a CNN with those produced using this architecture. The classification performance of the SNN + CNN pipeline design was significantly better than that of the CNN-based model alone [
32].
For the multiclass classification of AD, Farooq et al. [
33] presented deep-learning algorithms GoogLeNet, ResNet-18, and ResNet-152. Four classes (AD, LMCI, MCI, and CN) with MRI values of 33, 22, 449, and 45, respectively, were used in the experiments. ResNet-18 and ResNet-152 achieved accuracies of 98.88, 98.01, and 98.14%, respectively. Xia et al. [
34], using AD (198), CN 229, and MCI (408) data, used 3D CLSTM to extract deep salient features and recognized 94.19% of AD cases. Using several CNN architectures, Ashraf et al. [
35] realized fine-tuned features and reported a recognition rate of 99.05% for the diagnosis of AD. In [
36], tissue segmentation was used to extract gray matter tissue from each patient. A binary class using the VGG architecture was then applied to produce a recognition rate of 98.73% for AD versus NC. Research that employs images processed for morphometric analyses is also available [
37], in addition to research that uses raw or normalized MRI images as inputs to CNNs. A hybrid approach based on VBM and quantitative susceptibility mapping, an MRI technique for calculating magnetic susceptibility in tissues, was proposed by Sato et al. for the early identification of AD. Their methods outperformed the conventional VBM-based method in classifying the MCI and NC groups more effectively. However, the performance of their approach’s MCI/AD classification (68%) was lower than anticipated [
38].
4. Results and Experiments
In this section, we outline the setup for the experiment and the results for the four models used in this study. Along with a competing model, we evaluated the performance of the suggested deep TL model.
4.1. Experimental Settings
The deep transfer-learning models tested in this manuscript are implemented using the Python 3.10.10 programming language and the Keras library within the TensorFlow 2.12 library. Keras is an open-source neural network library written in Python and has a high-end structure powered via both GPU and CPU. The hardware used for training the models is 64-bit, has a 4-core processor running at 2199 MHz, 32 GB of memory, and an Nvidia Tesla P100 GPU.
4.2. Hyperparameter and Optimization Techniques
The entire dataset is divided into approximately 66% training, 17% validation, and 17% testing. Validation and Test data are divided by taking close values in accordance with the literature [
53]. The parameters that can impact model training are referred to as hyperparameters. To preserve the pre-learned filters, the convolutional base of the pre-trained architectures was entirely frozen, meaning that the weights of these layers remained unchanged during training. By changing the trainable parameters of the model to false, the convolutional base is frozen. The Adam optimizer was used to train our models for 200 epochs at a learning rate of 1 × 10
−3. The training hyperparameters are presented in
Table 4.
Throughout the training process, we applied various data augmentation techniques, including a zooming range of 20%, horizontal flipping, and a rotation of 45°. These augmentation operations aimed to enhance the dataset and reduce the risk of overfitting (
Table 5).
Subsequently, normalization was performed to facilitate learning. This technique aids in reducing computational intricacy by rescaling pixel values within the range of 0 to 1. For the multiclass classification problem, the “categorical_crossentropy” loss function and “accuracy” metrics are preferred. To prevent overfitting, the method implements an early stopping method that halts the training process if the accuracy of the validation dataset remains unchanged for a predetermined number of epochs.
4.3. Experimental Results
In this study, as mentioned in
Section 3.3, TL was employed to train all DL models. We tested four deep TL models and evaluated their performances based on the indicators described in
Section 3.4. The purpose of this research was to evaluate the effectiveness of the proposed deep TL model for the detection of AD and its early stages and to compare its performance with the most advanced CNN models reported in the existing literature. In order to achieve the main goals of this study, we carried out a comparative analysis of each model. To achieve this, we examined the test dataset’s average F1 score, sensitivity, specificity, and accuracy scores for all models. According to the results presented in
Table 6, the Xception + PPC model demonstrated the highest overall performance on the dataset, with an accuracy of 95.81%. Moreover, this model exhibited the highest sensitivity, specificity, precision, and F1 score, with values of 95.41%, 97.92%, 95.01%, and 95.21%, respectively. The Xception + PPC model also demonstrated excellent sensitivity, which is crucial for minimizing the misdiagnosis rate of volumetric changes in brain tissue. These findings indicate that Xception and PPC can effectively differentiate between Alzheimer’s stages. The Xception architecture uses a technique called separable convolution. This technique is a more efficient computational method than traditional convolution. Depth-wise separable convolutions offer superior expressiveness and productivity compared with classical convolutions. By incorporating depth-wise separable convolutions, the Xception model becomes highly proficient at learning distinct and high-level features that may be overlooked by simpler models. Another successful model is the ResNet-50V2 + PPC model, which achieves an accuracy of 93.35%. Additionally, the sensitivity, specificity, and F1 scores reached 92.81%, 96.59%, and 92.57%, respectively.
Figure 5 shows the accuracy achieved in the test datasets for all models. The test accuracy depicted in
Figure 5 is computed by dividing the number of correctly classified patients (CN, MCI, and AD) by the total number of patients.
Figure 5 clearly shows that the Xception + PPC model outperformed the other three models.
Table 7 presents the performance of the models by class. The best results are bolded. For training purposes, 1936 medical images were utilized, while 406 additional images were designated for testing. The analysis included three classes: AD, MCI, and CN. From the results table, it is evident that the Xception + PPC model exhibits strong performance across all classes. The model achieved an average precision, sensitivity, and F1 score of 0.95. The model attained a precision of 0.98 and demonstrated a good sensitivity of 0.97 for the MCI classes. These findings demonstrate the model’s capability to attain a high level of accuracy in diagnosing the disease, particularly in the early stages of Alzheimer’s disease, such as the MCI stage. For the macro-average scores of all evaluation metrics, it is evident that the Xception + PPC model outperformed the other models.
Figure 6 shows the confusion matrix, which provides a detailed overview of the class-wise results for the Xception + PPC model. By examining the confusion matrix, we assessed the number of correctly classified and misclassified images for the specific classes. From the confusion matrix, we can infer that the Xception + PPC model misclassified only seventeen images from the test dataset. Of the 406 tests, 389 were accurately classified by the model, demonstrating a high accuracy. Therefore, based on the evaluation metrics accomplished by the proposed model, it can be inferred that Xception + PPC outperformed the other models in all aspects.
We want the area to be 1 in an ideal receiver operating characteristic (ROC) curve and aim to move away from the FPR value as the TPR value increases. Within this context, the suggested approach demonstrated superior classification capability compared to alternative methods, as evidenced by an average AUC of 0.97, specifically achieving an 0.98 AUC for MCI prediction (
Figure 7). The effectiveness of this method becomes apparent when the confusion matrix is examined.
4.4. Performance Evaluation in Relation to Baseline Models
Our study aimed to evaluate the effectiveness of the proposed deep TL model for diagnosing AD and its early stages and compare its performance with that of state-of-the-art CNN models reported in the literature. The classification performance of the models was assessed in terms of accuracy, sensitivity, precision, and the F1 score. The proposed Xception + PPC model exhibited superior accuracy compared with the baseline model, achieving a 6.65% increase. Furthermore, the VGG-16 + PPC, VGG-19 + PPC, and ResNet-50V2 + PPC models demonstrated accuracy improvements of 3.7, 4.19, and 5.42%, respectively. The incorporation of multiple dense layers in the PPC contributed to enhanced learning ability and accuracy. PPC also exhibited improved detection rates and stability for AD classification. The baseline models showed lower performance on the test set, with accuracies ranging from 85% to 89% as well as lower sensitivity and F1 scores. Several factors, including dataset variations, overfitting, and challenges in feature extraction from TBM images, contribute to the underperformance of base models. A comparison of the computational costs revealed that the Xception + PPC model had a longer training time (3130.86 s) than the base model. However, our primary focus was on improving the accuracy of the method, which showed significant enhancement after the incorporation of PPC. Overall, our proposed method outperformed the baseline models in terms of classification performance and demonstrated potential for accurate AD diagnosis (
Table 8).
4.5. Comparison with Related Works
Morphometric-based studies have been limited to the literature. Research related to the application of deep-learning methods for the early detection of AD is scarce. Furthermore, existing studies have predominantly focused on VBM analyses, with only a few studies comparing TBM images. The effectiveness of the suggested model based on the Xception + PPC architecture was compared with that of other competing models. Many studies have utilized the same dataset for classification purposes; therefore, we selected the ADNI database for our research. Accuracy was used as the major parameter to evaluate the outcomes of classification.
Table 9 presents a comparison of the suggested model using the same dataset as other comparative studies in the literature, demonstrating its superior performance with a 95.81% accuracy rate in the three-class dataset. In addition, most studies have focused on binary classifications. To the best of our knowledge, no study has focused on the early diagnosis of AD using deep TL-based TBM analysis for multiclass classification. The proposed model exhibits the capability to effectively address a three-class problem.
Despite utilizing a unified architecture, they achieved the highest accuracy rate of 87% using raw MRI data [
32]. In this study, a higher success rate was achieved compared to methods that utilized raw datasets in parallel with other related studies. Similarly, it has been observed that morphometric methods achieved higher accuracy rates than raw MRI datasets [
54,
55]. In [
55], the VGGNet base architecture was used for VBM-based analysis, resulting in 96% accuracy. In this study, the accuracy rates of 85% and 87% were obtained using the baseline VGGNet16 and VGGNet19, respectively. This finding suggests that a VBM-based analysis using base models may yield more successful results than a TBM-based analysis.
4.6. Strengths and Limitations
So far, studies have generally focused on binary classification problems for Alzheimer’s diagnosis. This study specifically addresses the multiple classification problem of AD/MCI/CN. This study was conducted using a publicly available dataset. To mitigate overfitting issues and extract disease-specific features, data augmentation was employed due to the limited number of images in the dataset.
The proposed model utilizes a deep neural network and does not require separate feature extraction. Various databases are utilized in the literature for AD diagnosis. However, collecting data from entire neuroimaging databases can be challenging. Additionally, neuroimages are processed using different methods. In this study, an ROI-based method was implemented to improve the accuracy rate. Furthermore, performance enhancements were attempted via transfer learning.
Morphometric images may possess higher dimensions compared to normal MRI images, thereby requiring increased memory and processing power. Furthermore, manually labeling such datasets can be arduous. Hence, expert knowledge and time-consuming manual procedures may be necessary for generating labeling data.
Despite CNN’s favorable performance in medical image analysis, there are still lingering issues. Limited data availability is particularly problematic in the field of medical image processing. To overcome this, a large database was preferred for this study.
Although the transfer-learning method employed in this study boasts numerous advantages, a neural network based on the complex structure of the Xception architecture presents challenges. These complex model structures can affect the method’s applicability, including training, hyperparameter tuning, and computational resource requirements. To overcome these limitations, successful adaptive methods such as the Adam optimizer were utilized.
5. Conclusions
The primary objective of this study is to develop an automated DL method for the early detection of AD. Determining the disease stage presents a significant challenge because of the high similarity between AD stages. To overcome this challenge, we employed morphometric methods and conducted experiments involving three types of classification. All images were preprocessed using image-processing techniques. For AD detection using TBM images, we adopted four popular deep-learning architectures based on the deep TL technique. The last layer of examined architectures was completed with deep dense blocks and softmax layers to enhance classification performance. Specifically, our proposed model, based on the Xception architecture, utilizes depth-wise separable convolution, enabling the efficient learning of noticeable and high-level features. The incorporation of a deep dense block further enhances the performance of the model. Normalizations of data, data augmentation, and dropouts were employed to mitigate overfitting, whereas the Adam optimizer ensured fast learning. The proposed model obtains an impressive overall classification accuracy of 95.81% for the dataset used, clearly outperforming other models in terms of performance. Our model exhibits superior classification accuracy compared to existing models. In future work, we intend to expand the dataset by incorporating additional TBM brain image data, including sagittal and coronal images, while maintaining performance standards. Moreover, we plan to enhance our architecture further by conducting experiments using different parameter settings.