1. Introduction
A computer-aided diagnosis system (CADS) helps to automate the diagnosis process of diseases. Clinicians are increasingly using CADS to help them detect and interpret diseases. CADS identifies regions of an image that may reveal certain problems and notifies doctors during image interpretation [
1]. Typically, CADS includes pre-processing, feature extraction, and classification [
2].
Alzheimer’s disease (AD) is a neurological disease most commonly linked to memory and cognitive loss. Neurodegenerative disorders are not curable [
3]. The goal of medicine is to enhance patients’ well-being and slow the progression of the disease. However, early diagnosis may provide AD patients with better treatment outcomes than patients who discover the disease late. Mild cognitive impairment (MCI) disease is an initial stage of AD. MCI is a form of memory loss or a decline in cognitive skills, such as language or vision. Patients with MCI go through a stage in which they have cognitive deficiencies that are not severe enough to cause dementia. MCI is a condition in which patients experience greater memory or thinking difficulties than people of the same age. Studies show that individuals with MCI are more likely than people with normal cognitive abilities to develop Alzheimer’s disease within a few years [
4]. According to the National Institute on Aging [
5], dementia is estimated to develop in 10% to 20% of MCI patients aged 65 or older within one year.
Brain imaging can detect MCI before clinical symptoms appear [
6]. Researchers use medical imaging modalities to diagnose AD and MCI, including positron emission tomography (PET) and magnetic resonance imaging (MRI) scans. These studies classify MCI disease based on its biomarkers (indicators that the disease exists), such as decreased grey matter volume.
Figure 1 illustrates how the grey matter volume in an AD patient’s brain is less than that in a healthy brain, whereas the grey matter volume for the MCI patient is less than that of the healthy brain but more than that of the brain affected by AD.
Another significant biomarker related to MCI is the shrinkage of the hippocampus and entorhinal cortex (EC) area. In contrast with the hippocampus, the EC area is important in the early detection of MCI as studies have demonstrated that the EC area shrinks before the hippocampus region [
8]. However, there are limited studies exploring the EC area for the diagnosis of MCI as EC is challenging to analyze due to its small size, which makes it difficult to detect with the human eye.
Recently, machine learning approaches are being more widely used to automatically classify medical images [
9,
10,
11]. However, the feature engineering step in machine learning approaches is time-consuming and requires expert knowledge. In contrast, deep learning models can automatically learn relevant features from the raw data, eliminating the need for explicit feature engineering. A recent study [
12] employed a mathematical approach called ellipse fitting to detect abnormalities in medical images. However, the ellipse fitting algorithm is sensitive to outlier data when the organ being analyzed has an irregular shape, which can significantly affect the algorithm’s accuracy.
Significant advancements have been made in the field of brain tumor detection. Deep learning algorithms have been used to analyze brain images and identify patterns of brain tumors. Recent studies have demonstrated that convolutional neural networks (CNNs) are highly accurate at detecting brain tumors [
13,
14,
15]. In addition, deep learning algorithms have been shown to distinguish between various forms of brain tumors, assisting medical professionals in determining the best strategy for treatment [
16].
Inception architecture has demonstrated promising outcomes in brain studies, achieving notable performance in brain tumor detection [
17,
18]. Several advanced techniques were adopted in inception architecture, including factorized convolution and auxiliary classifiers. In factorized convolution, the number of parameters in the network is reduced by factorizing the convolutional filters into two smaller filters, minimizing the overall complexity of the model. Auxiliary classifiers refer to classifiers that are added to the network at intermediate levels to avoid overfitting and enhance prediction accuracy.
Support vector machine (SVM) is a supervised classification algorithm that separates the classification points by hyperplane (decision boundary) [
19]. The maximum distance between the hyperplane and the closest point is called the margin; this is what gives the SVM its robustness, meaning that it is dependable and avoids errors as much as possible. In practice, the arguments passed while creating the SVM classifier strongly influence the model outcome. Kernels, gamma, and C are augments that must be tuned to achieve the highest accuracy possible. The combination of a CNN as a feature extractor and an SVM as a classifier has been shown to perform medical imaging tasks effectively [
20].
Using pre-trained CNN architectures (such as VGG, Inception, and ResNet) to extract features of the EC is a powerful technique that can save time and resources compared to training a model from scratch. The effectiveness of this technique can be explained theoretically in the following ways: (1) using the transfer learning technique as a feature extractor can capture both low-level and high-level visual information of the EC area because these models have been trained on large datasets, such as ImageNet, with 10,000 classes. (2) By using our proposed method of extracting the features of the EC area, we reduce the chance of overfitting issues that can occur since training a CNN from scratch on a small dataset can lead to overfitting.
In this study, we aim to perform automatic classification of MCI disease from MRIs using the EC area. To achieve this goal, we define the study objectives as follows: (1) construct a dataset for the EC area from MRIs with normal and MCI subjects; (2) investigate using different collections of MRI slices as inputs for the classification system; (3) explore different neural network architectures, including VGG16, Inception-V3, and ResNet50, to extract the features of the EC area; and (4) classify subjects using machine learning algorithms, including CNN and SVM. There are two important contributions of this study. First, this work expands on the studies conducted on the EC area since there is a limited body of knowledge addressing EC and linking it with neurological diseases. Second, to our knowledge, no dataset for the EC area is available. In this work, we used MRIs from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset to extract the left and right EC areas and use them as inputs for the proposed classification models.
The remaining sections of this document are organized as follows:
Section 2 presents the state-of-the-art classification systems for predicting MCI,
Section 3 explains the proposed system,
Section 4 reports the experimental results, and
Section 5 presents the conclusions and recommendations for further studies.
3. Proposed System
In this study, optimized deep learning systems are implemented to differentiate between MCI and NC subjects. The proposed classification system includes the following phases: MRI pre-processing, extraction of the EC area, feature extraction, and classification.
For the pre-processing phase, the first step is to align all MRI images to a standard template, the International Consortium for Brain Mapping (ICBM152). Second, the MRIs are prepared by performing skull stripping, motion correction, intensity normalization, cortical service segmentation, and parcellation. To extract the EC area, the Desikan–Killiany atlas is used. Next, the VGG16, Inception-V3, and ResNet50 networks were used as feature extraction for the EC area. Subsequently, the classification process is applied using two different classifiers: CNN and SVM.
Figure 2 depicts the workflow of the proposed prediction system.
3.1. Constructing the Dataset
To our knowledge, there is no public dataset available with segmented EC to perform the experiments. For this reason, MRIs from the ADNI dataset [
39] were used to segment the EC and construct the ground truth.
The size of the dataset is 779 3D MRIs, including MCI and NC MRIs. The number of participants included in this study is 188 participants, of whom 95 are NC and 93 have MCI. Each participant has one or more MRIs. The total number of MRIs for NC is 442, while the number of MRIs for the MCI is 337. The participants range in age from 55 to 90 years old.
Table 4 provides demographic information for the dataset included in this study.
The data included in this study are T1-weighted MRIs. The field strength of the MRIs is 3 Tesla, and the 3D images are in Neuroimaging Informatics Technology Initiative (NIfTI) format. The data used are in the original format, meaning no pre-processing was performed previously by the dataset provider to ensure total control over the data. The dimensions of each 3D MRI are (256, 256, 170).
3.2. Pre-Processing
The first step in pre-processing is performing the MRI alignment. The objective of the alignment process is to center all the images such that, when a specific area is selected, the same area will be used for all the images. This step is crucial as not all images have the same orientation or centering due to the manual screening completed by the physicians. For example, the skull stripping process could be affected negatively if the images are not centered. This could result in the tool extracting a part from the skull because it does not differentiate between the skull and the brain. Moreover, alignment can also solve the problem when the head is not oriented and tilted forward or backward.
Typically, this is completed by linearly aligning the image to a standard template, ICBM152. In this work, we used the statistical parametric mapping (SPM12) tool [
40] to perform the alignment process for all the MRIs.
Further, 3D MRI volumes in NIfTI format were used to construct the 2D surface to perform the analysis. FreeSurfer is an open-source tool that is publicly available online [
41]. The preprocessing approach includes thirty-one steps for all processes to complete, which are grouped into three phases: Autorecon 1, Autorecon 2, and Autorecon 3. The output of each phase is the input for the subsequent phase, as shown in
Figure 3. The first five steps in Autorecon 1 address the volume itself, skull stripping, motion correction, and intensity normalization. These steps are followed by Autorecon 2 for performing the segmentation for the cortical surface. The last phase, Autorecon 3, generates the statistical data and cortical parcellation.
3.3. Extracting the Entorhinal Cortex Area
To automatically extract the EC area, the Desikan–Killiany atlas was used. The brain regions are defined by an anatomical atlas. Various atlases are available, such as the Desikan–Killiany and Destrieux atlases; each atlas has different brain regions, which are segmented and painted with distinct colors. In this study, we used the Desikan–Killiany atlas because it contains the anatomical location of the EC area and uses a dataset of 40 MRI images for labeling ROIs in the left and right hemispheres [
43].
Figure 4 lists the names of the regions that exist in the Desikan–Killiany atlas, along with the axial, sagittal, and coronal views of those regions. The EC area painted in the figure in red color code 1006 represents the left EC, while the area painted in 2006 represents the right EC.
The extracted 3D images are in the shape (256, 256, 256).
Figure 5 shows samples of the extracted left and right EC areas in this study for participants with and without MCI disease. The images show how the EC is significantly different in size and shape for the two groups. The images for participants without MCI show a normal size of the EC area, while the EC area for participants with MCI appears smaller in size.
3.4. Feature Extraction and Classification
The first step in this section is performing the feature extraction for the segmented EC images resulting from the previous step. This study uses the transfer learning concept for feature extraction of the EC area to improve the detection of features. The convolutional base (convolutional and pooling layers) of the previously trained models is used. Three CNN models are examined: VGG16, Inception-V3, and ResNet50.
Figure 6 illustrates the six experiments conducted in this study.
VGG16 [
44] consists of sixteen weight layers, thirteen convolutional layers, five max pooling layers (no trainable weights), and three dense layers.
Inception-V3 [
45] is the second pre-trained model used. The main concept of inception architecture is going wider instead of going deeper. Inception- V3 consists of forty-eight layers; the block of the inception module consists of three convolutional layers and one max pooling concatenated together.
ResNet50 [
46] has fifty layers, forty-eight convolutional layers, one max pooling layer, and one average pooling layer.
The second step after the feature extraction is classification. In this approach, two different classification methods are used that are based on binary classification (NC and MCI). The last fully connected layer is removed as it is responsible for multiclassification, which is not used in this approach.
The extracted features are in 3D form, so we flatten them before they run through the classifier. We examined two different classifiers for comparison purposes: the first classifier is a CNN layer with a sigmoid activation function to perform binary classification tasks (NC vs. MCI); the second is the SVM classifier. The 5-fold cross-validation technique is used for the resampling of the EC images. The data are divided into five folds; the first fold is used as a testing set, and the remaining 4 sets are used for training the model. The second fold uses the second set as a test set and the remaining 4 sets as training sets. This procedure is repeated for each of the five groups. The final result is the mean for all classification results obtained for each group.
4. Experimental Design and Results
The experiments conducted to validate the proposed methodology for the prediction of MCI disease are described in detail in this section. The evaluation matrices adopted to evaluate the performance of each experiment are introduced. Moreover, the experiments conducted to produce an efficient classification system capable of differentiating between MCI and NC samples are discussed. The experiments mainly focus on the brain slices used as inputs for the classification system, the feature extraction techniques, and the classifier. The following subsections clarify the objective of each experiment along with the results obtained.
Various metrics were used to evaluate the performance of each experiment, including accuracy, F1 score, sensitivity, specificity, and AUC [
47]. For these evaluation matrices, a score of 1 indicates that all samples were predicted correctly, while a score of 0 indicates that none of the samples were predicted correctly. The following list provides the definitions and equations for each matrix used.
Accuracy measures the ratio of the correctly classified samples to the total of samples in the dataset.
F1 score measures the harmonic mean of the precision and recall. The definitions and matrices for precision and recall are presented below.
- -
Precision measures the ratio of the correctly classified samples to all samples assigned to that class.
- -
Recall, also known as Sensitivity or true positive rate, measures the proportion of actual positive cases that are correctly identified by a classifier.
Specificity, also known as true negative rate, measures the ratio of the correctly classified negative samples to the total of negative samples in the dataset.
Area under the curve (AUC) is calculated using the Receiver Operating Characteristic (ROC) curve, which is a graphical representation of the performance of a binary classification model. The ROC curve is a plot of the true positive rate against the false positive rate for the classifier at different threshold values. The AUC represents the overall performance of the classifier across all possible threshold values.
4.1. Experiment 1: Investigate the Use of Different Collections of MRI Slices as Inputs for the Classification System
The original shape of the EC area after the extraction from the Desikan–Killiany atlas is (256, 256, 256) for each MRI. Each 3D image is transformed into 2D images for visualization purposes, resulting in 256 images in the shape of (256, 256) for each MRI. After exploring the images, we observe that some images are uninformative for classifying the MCI since the EC area is either incomplete or cannot be observed in the slice, as shown in
Figure 7. To obtain the highest accuracy for the classification model, various experiments were implemented using different groups of 2D MRI slices. The feature extraction network and the classifier were fixed for each experiment to isolate the influence of the MRI slice groups on the accuracy. The original size of the images was 256, 256 pixels; the new size of the images after the cropping process was 100, 70 pixels. The hyperparameters used for training the models included an epoch value of 50 and the adaptive moment estimation (Adam) optimization method. Detailed information for each scenario is listed below.
Experiment 1.1: Include all MRI slices (slice 0–slice 255)
In the first experiment, we included all the MRI slices as inputs for the CNN classification system.
Experiment 1.2: Select the range of the MRI slices (slice 130–slice 140)
As discussed above, some slices are not useful for the classification process. For the second experiment, slices 130–140 are included. This range of slices is chosen based on the visualization of the images. From these slices, we observed that the EC area was clearer and more informative than for other MRI slices, as shown in
Figure 7.
Experiment 1.3: Exclude uninformative MRI slices.
Rather than choosing a specific range of MRI slices, in this experiment, the MRI slices with the black pixels (empty slices) are deleted as they do not provide information for the classification task. The pillow library in Python is used to analyze each MRI slice; if all the pixels of the slice are black, the slice is ignored.
Experiment 1 Results:
Table 5 shows the accuracy obtained using different groups of MRI slices. By excluding the uninformative slices (Experiment 1.3), we obtained the highest accuracy of 60% compared to other experiments. The lowest accuracy was gained by including all 256 MRI slices in the classification model. Based on this result, all subsequent experiments were conducted using the dataset used in Experiment 1.3 as it achieved the highest accuracy.
4.2. Experiment 2: Investigate Implementing the PCA Technique for Feature Reduction
Many features are noisy, cause overfitting, and slow down the training and testing models. As the number of features extracted from the EC area is very large (approximately 51,200 features), reducing the number of features is needed before using the SVM classifier. In this experiment, we used principal component analysis (PCA) to reduce the number of features included in the model, with the objective of identifying the minimum number of features that provides the maximum accuracy possible.
The optimal number of principal components (PCs) is obtained with maximum variance. Based on the Scree plot shown in
Figure 8, 2000 PCs are required to explain approximately 95% of the variance, which is significantly less than the 51,200 features. Different values of PCs are tested in this experiment to see how the accuracy responds.
Table 6 shows the number of PCs used and the corresponding accuracy. We observe that the accuracy was 45% when the number of PCs was 2500, and the accuracy increased as the number of PCs increased. At approximately 10,000 PCs, the accuracy started to decrease. The maximum accuracy of 53% was observed for 7500 PCs.
4.3. Experiment 3: Evaluate the Accuracy of the SVM Classifiers with Tuned Parameters
For tuning the parameters of the SVM classifier, a grid search is used. We examined different values for the four key parameters of the SVM classifier: kernel, C, gamma, and degree. In this experiment, the model inputs and the feature extraction network remain fixed. The highest accuracy of 56% was obtained using the parameters listed in
Table 7. Below is a description of each hyperparameter tuned in this experiment.
For the kernel hyperparameter, three types of kernel were tested: linear, radial basis function (rbf), and poly. Both poly and rbf produce the highest accuracy, 56%. The poly kernel with degree of 2 is chosen because the execution time is less than that of rbf.
The C hyperparameter is responsible for adding a penalty for incorrectly classified points. The C values examined in this experiment are [0.1, 1, 10, and 100]. Note that increasing the value of C can result in overfitting and poor generalization for testing datasets.
Smaller gamma values are known to produce a generalized decision boundary, whereas larger gamma values produce a complex decision boundary that may overfit the training data. Sklearn library [
48] provides two arguments for gamma: ‘scale’ and ‘auto’. The scale value represents 1/(n_features X.var()). The n_features is the number of features in the dataset, whereas the X.var expresses the variance. For the ‘auto’ argument, the value set for the gamma is a fixed number based on the number of features. Both arguments were tested in this experiment, with a higher accuracy value obtained for the scale argument.
4.4. Experiment 4: Tuning VGG16, Inception-V3, and ResNet50 Network Parameters
In this experiment, our objective is to improve the performance of the CNN classifier by tuning the parameters of each pre-trained model (VGG16, Inception-V3, and ResNet50) separately. In the following tests, we examined different values of the epoch size, optimization method, and learning rate for each model.
Tuning Parameters of the VGG16
Epochs: The performance of the VGG16 based on different epoch values is evaluated in this experiment. The range for the examined epoch values was 25 to 300 epochs, as listed in
Table 8. As can be seen in the table, the accuracy remains nearly constant (within 54%) from epochs 25 to 200, whereas the accuracy increases at 300 epochs.
Table 8.
Accuracy of VGG16 model across different values of epoch parameter.
Table 8.
Accuracy of VGG16 model across different values of epoch parameter.
Classifier | Feature Extraction | Epoch | Accuracy |
---|
CNN | VGG16 | 25 | 0.55 |
CNN | VGG16 | 50 | 0.55 |
CNN | VGG16 | 100 | 0.54 |
CNN | VGG16 | 200 | 0.54 |
CNN | VGG16 | 300 | 0.57 |
Optimization Method: The different optimization methods evaluated in this test are listed in
Table 9. The model is tuned with Adam, stochastic gradient descent (SGD), and root mean square propagation (RMsprop). The highest accuracy was obtained with the SGD optimizer, while the lowest accuracy was obtained with the RMSprop optimizer.
Table 9.
Accuracy of the VGG16 model for different optimization methods.
Table 9.
Accuracy of the VGG16 model for different optimization methods.
Classifier | Feature Extraction | Epoch | Optimization Method | Accuracy |
---|
CNN | VGG16 | 300 | Adam | 0.57 |
CNN | VGG16 | 300 | SGD | 0.60 |
CNN | VGG16 | 300 | RMSprop | 0.54 |
Learning rate: The model performance is evaluated for different learning rate values, as shown in
Table 10. The range of learning rate values is 0.1–1 ×
. The optimizer used is SGD, and the epoch size is 300. The resulting accuracy obtained with a large learning rate (0.1) is 62%, while the accuracy with a small learning rate (1 ×
) decreases to 59%, and the accuracy for the middle learning rate values (0.01 and 0.001) increases to 63%.
Table 10.
Accuracy of the VGG16 model for different values of the learning rate parameter.
Table 10.
Accuracy of the VGG16 model for different values of the learning rate parameter.
Classifier | Feature Extraction | Epoch | Optimization Method | Learning Rate | Accuracy |
---|
CNN | VGG16 | 300 | SGD | 0.1 | 0.62 |
CNN | VGG16 | 300 | SGD | 0.01 | 0.63 |
CNN | VGG16 | 300 | SGD | 0.001 | 0.63 |
CNN | VGG16 | 300 | SGD | 0.0001 | 0.60 |
CNN | VGG16 | 300 | SGD | 0.00001 | 0.59 |
Tuning Parameters of Inception-V3
Epochs: The performance of the Inception-V3 based on different epoch values is evaluated in this experiment. The range for the examined epoch values was 25–300 epochs, as listed in
Table 11. The highest accuracy obtained is 67% for 200 epochs.
Table 11.
Accuracy of Inception-V3 model for different epoch values.
Table 11.
Accuracy of Inception-V3 model for different epoch values.
Classifier | Feature Extraction | Epoch | Accuracy |
---|
CNN | Inception-V3 | 25 | 0.62 |
CNN | Inception-V3 | 50 | 0.63 |
CNN | Inception-V3 | 100 | 0.65 |
CNN | Inception-V3 | 200 | 0.67 |
CNN | Inception-V3 | 300 | 0.64 |
Optimization Method: The different optimization methods evaluated in this test are listed in
Table 12. The model is tuned with Adam, SGD, and RMsprop. The highest accuracy was obtained using the Adam optimizer, with an accuracy of 67%, while the lowest accuracy of 49% was obtained with the RMSprop optimizer.
Table 12.
Accuracy of Inception-V3 for different optimization methods.
Table 12.
Accuracy of Inception-V3 for different optimization methods.
Classifier | Feature Extraction | Epoch | Optimization Method | Accuracy |
---|
CNN | Inception-V3 | 300 | Adam | 0.67 |
CNN | Inception-V3 | 300 | SGD | 0.56 |
CNN | Inception-V3 | 300 | RMSprop | 0.49 |
Learning rate: The model performance is evaluated for different learning rate values, as shown in
Table 13. The range of learning rate values is 0.1–1 ×
. The optimizer used is Adam, and the epoch size is 200. The resulting accuracy obtained with a large learning rate (0.1) is 60%, while the accuracy with a small learning rate (1 ×
) is 61%, and the highest accuracy of 67% was obtained with a learning rate of 0.0001.
Table 13.
Accuracy of Inception-V3 for different learning rate values.
Table 13.
Accuracy of Inception-V3 for different learning rate values.
Classifier | Feature Extraction | Epoch | Optimization Method | Learning Rate | Accuracy |
---|
CNN | Inception-V3 | 300 | SGD | 0.1 | 0.60 |
CNN | Inception-V3 | 300 | SGD | 0.01 | 0.66 |
CNN | Inception-V3 | 300 | SGD | 0.001 | 0.63 |
CNN | Inception-V3 | 300 | SGD | 0.0001 | 0.67 |
CNN | Inception-V3 | 300 | SGD | 0.00001 | 0.61 |
Tuning Parameters of ResNet50
Epochs: The performance of the ResNet50 based on different epoch values is evaluated in this experiment. The range of examined epoch values is 25–300 epochs, as listed in
Table 14. The highest accuracy achieved is 63% for both 100 and 300 epochs.
Table 14.
Accuracy of the ResNet50 model for different epoch values.
Table 14.
Accuracy of the ResNet50 model for different epoch values.
Classifier | Feature Extraction | Epoch | Accuracy |
---|
CNN | ResNet50 | 25 | 0.54 |
CNN | ResNet50 | 50 | 0.61 |
CNN | ResNet50 | 100 | 0.63 |
CNN | ResNet50 | 200 | 0.61 |
CNN | ResNet50 | 300 | 0.63 |
Optimization Method: The different optimization methods evaluated in this test are listed in
Table 15. The model is tuned with Adam, SGD, and RMsprop. The highest accuracy was obtained with the RMSprop optimizer, while the lowest accuracy was obtained with the SGD optimizer.
Table 15.
Accuracy of the ResNet50 for different optimization methods.
Table 15.
Accuracy of the ResNet50 for different optimization methods.
Classifier | Feature Extraction | Epoch | Optimization Method | Accuracy |
---|
CNN | ResNet50 | 300 | Adam | 0.63 |
CNN | ResNet50 | 300 | SGD | 0.55 |
CNN | ResNet50 | 300 | RMSprop | 0.65 |
Learning rate: The model performance is evaluated for different learning rate values, ranging from 0.1 to 1 ×
. The optimizer used is RMSprop, and the epoch size is 300. The highest accuracy value of 66% was obtained at learning rates of 0.01 and 1 ×
, as highlighted in
Table 16.
Table 16.
Accuracy of the ResNet50 for different learning rate values.
Table 16.
Accuracy of the ResNet50 for different learning rate values.
Classifier | Feature Extraction | Epoch | Optimization Method | Learning Rate | Accuracy |
---|
CNN | ResNet50 | 300 | RMSprop | 0.1 | 0.61 |
CNN | ResNet50 | 300 | RMSprop | 0.01 | 0.66 |
CNN | ResNet50 | 300 | RMSprop | 0.001 | 0.64 |
CNN | ResNet50 | 300 | RMSprop | 0.0001 | 0.65 |
CNN | ResNet50 | 300 | RMSprop | 0.00001 | 0.66 |
CNN | ResNet50 | 300 | RMSprop | 0.000001 | 0.61 |
4.5. Experiment 5: Choose the Optimal Combinations of Feature Extraction Techniques and the Classifier
This experiment examines different combinations of feature extraction networks tuned in previous experiments along with different classifiers. Matrices used to evaluate the results include accuracy, F1 score, sensitivity, specificity, and AUC.
Experiment 5.1: VGG16 + CNN
For this experiment, the VGG16 network architecture was employed for feature extraction, while CNN was used to classify MCI and CN samples. The model correctly predicted 70% of the dataset samples. Moreover, the model showed an acceptable balance between precision and recall, achieving an F1 score of 66%. The model correctly identified 69% of the positive cases in the dataset and 70% of the negative samples. The obtained AUC value was 68%.
Experiment 5.2: Inception-V3 + CNN
In this experiment, the Inception-V3 network architecture was used for feature extraction, while the CNN was used to classify MCI and CN samples. The model correctly predicted 70% of the dataset samples and demonstrated a good balance between precision and recall, as evidenced by its F1 score of 73%. The model correctly identified 90% of the positive cases and 54% of the negative samples in the dataset. The obtained AUC value was 69%.
Experiment 5.3: ResNet50 + CNN
The ResNet50 network architecture was used for feature extraction, while the CNN was used to classify MCI and CN samples. The model correctly predicted 73% of the dataset samples and demonstrated an acceptable balance between precision and recall, as evidenced by its F1 score of 65%. The model correctly identified 58% of the positive cases and 84% of the negative samples in the dataset. The obtained AUC value was 63%.
Experiment 5.4: VGG16 + SVM
The VGG16 network architecture was used for feature extraction, while the SVM classifier was used to classify MCI and CN samples. The model correctly predicted 76% of the dataset samples, which was the highest accuracy value achieved compared to the other five experiments. Furthermore, the model achieved an F1 score of 47%. The model correctly recognized 34% of the positive cases in the dataset and 64% of the negative samples. The obtained AUC value was 63%.
Experiment 5.5: Inception-V3 + SVM
The Inception-V3 network architecture was used for feature extraction, while the CNN was used to classify MCI and CN samples. The model correctly predicted 66% of the dataset samples and achieved an F1 score of 47%. The model also correctly identified 66% of the positive cases and 45% of the negative samples in the dataset. The obtained AUC value was 63%.
Experiment:5.6: ResNet50 + SVM
The ResNest50 architecture was used for feature extraction, while the SVM classifier was used to classify MCI and CN samples. The model correctly predicted 69% of the dataset samples and achieved an F1 score of 58%. The model correctly identified 48% of the positive cases in the dataset and 68% of the negative samples. The obtained AUC value was 67%.
Table 17 shows the results of the classification of MCI vs CN samples. The performance matrices for the six experiments conducted in this study are visualized in
Figure 9.
In this work, we ran FreeSurfer on an Aziz supercomputer [
49]. The processes performed by FreeSurfer are resource-intensive, requiring vast quantities of CPU time, memory, and disk space. We processed the MRI data in parallel, which helped to reduce the time needed to extract the EC data. All the data used in this work were processed on the same machine using the Aziz supercomputer, whereas the deep learning experiments were implemented on Google Colaboratory Pro. The device used to implement the experiments was an Intel i7-1165G7 with 16 GB of RAM.
4.6. Experiment 6: Comparison with State-of-the-Art MCI Classification Systems
Leandrou et al. [
35] used the EC area for the classification of MCI vs NC and they validated their approach on the ADNI dataset using a machine learning classifier implemented with binary logistic regression. The model’s input was MRIs that include only the EC region. The texture feature of the EC was used to differentiate between the samples. Their reported AUC value was 71%. In this study, the two models using the Inception-v3 and ResNet50 as feature extraction and CNN as classifier achieved 69% for AUCs, which is comparable to the results obtained by Leandrou et al. [
35]. These results demonstrate that the proposed method was able to differentiate between NC and MCI samples using the EC area.
5. Conclusions
This study implements a deep learning system for predicting MCI using the EC area as a biomarker. Investigating the EC area is crucial as the change in this area occurs before the hippocampus, which will help in the early diagnosis of MCI. In addition, limited studies have used EC as a biomarker for MCI because the area is small compared with the hippocampus area; thus, it is a challenge to detect the changes. The approach in this study uses the EC area to predict MCI using neural networks and machine learning algorithms.
Experiments in this research were conducted to produce an efficient classification system capable of differentiating between MCI and NC samples. The experiments mainly focused on brain slices used as inputs for the classification system, the feature extraction techniques, and the classifier. We performed investigations using different groups of MRI slices as inputs for the classification system. By excluding the uninformative slices, we obtained the highest accuracy compared to other experiments. We also investigated implementing the PCA technique for feature reduction before implementing the SVM classifier. We obtained a maximum accuracy of 53% with 7500 PCs. In this research, we evaluated the accuracy of the SVM classifier with tuned hyperparameters for kernel, C, gamma, and degree. The highest accuracy of the model was obtained for C = 0.1, degree = 2, gamma = scale, and kernel = poly.
We improved the performance of the CNN classifier by separately tuning the parameters of each pre-trained model (VGG16, Inception-V3, and ResNet50). We also examined different values for the epoch size, optimization method, and learning rate for each model. We found that using Inception-V3 as a feature extractor and CNN as a classifier produced the highest performance compared to the other models implemented.
A limitation of this research is that the model inputs are limited to MRI data, whereas other types of data, such as clinical, genetic, and genomics, are considered to be out of scope. In future work, we will extract the features of the hippocampus and EC area and use them as inputs for the proposed classification system. Combining the hippocampus with the EC area could improve the performance of the classification system.