This research collected data from the Swedish National Study on Aging and Care (SNAC) for the experimental purpose of the proposed model (FEB-SVM). The SNAC is a longitudinal consortium that has been collecting multimodal data from the Swedish senior population to “create trustworthy, comparable, durable datasets” to be used for aging research and aged care [
21]. The SNAC was developed as a multifunctional program to explore healthcare quality for the aging population. It comprises a database containing details regarding physical assessment, metacognition, social variables, lifestyle factors, medical records, and so on. Blekinge, Skåne, Nordanstig, and Kungsholmen are the sites from which the SNAC database is collected. They consider a couple of Swedish counties—municipal and borough. This research adopted the SNAC-Blekinge baseline examination, with data gathered from 2000 to 2003. In the literature, there is substantial evidence that environmental factors may impact dementia development [
22,
23]. This research is based on standard criteria and uses data from urban areas (Blekinge). The inclusion methodology used to eliminate individuals from this investigation is given as follows:
From the 1402 participants in the SNAC-Blekinge baseline, after the application of the selection criteria, 726 participants (313 males and 413 females) were included, 91 (12.5%) of whom had dementia in the 10 years, and 635 (87.5%) who were free of dementia. The demographics of the sample population in the collected dataset are shown in
Table 1. The factors chosen from the SNAC-Blekinge database were based on published research [
24,
25]. The collected dataset (SNAC-Blekinge) consists of 13 physical measurement parameters such as body mass index (BMI), pain in the last four weeks, heart rate sitting, heart rate lying, blood pressure on the right arm, hand strength in the right arm, hand strength in the left arm, feeling of safety when rising from a chair, assessment of rising from a chair, single-leg standing with right leg, single-leg standing with left leg, dental prosthesis, and several teeth.
It is important to remember that all of the features used in the SNAC were picked based on the evidence relevant to the aging process [
21]. At the commencement of the study (2000–2003), 75 variables were identified from the seven categories: demographics, lifestyle, social, psychological, medical history, physical examination, blood tests, and the assessment of various health instruments connected to dementia examination. Medical practitioners provide the target variable employed by the proposed model to predict dementia 10 years after the SNAC baseline. The dementia diagnosis was made using the International Statistical Classification of Diseases and Related Health Problems-10th Revision (ICD-10) and the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV).
Table 2 provides a feature category and name from the selected dataset (SNAC-Blekinge).
Proposed Work
In this work, we designed an automated diagnostic technique for the early prediction of dementia using machine learning and data mining approaches. The suggested diagnostic system is divided into two modules: the first module extracts valuable features from datasets to avoid the problem of model overfitting and the second module works as a classifier to predict dementia. We developed a novel feature extraction method based on linear discriminate analysis (LDA), independent component analysis (ICA), principal component analysis (PCA), locally linear embedding (LLE), and t-distributed Stochastic Neighbor Embedding (t-SNE). The aforementioned feature extraction methods have been cascaded into a single component, which we named a “feature extraction battery” (FEB). Feature extraction begins with an initial set of measured data. It creates derived values (features) that are meant to be useful and non-redundant, allowing future learning and generalization phases and, in some situations, leading to improved human interpretations. Feature extraction helps reduce the dimensionality of the dataset, which eventually reduces the computational complexity of the machine-learning models. The extracted features from FEB are fed into the predictive module of the proposed diagnostic system for the prediction of dementia. We employed a support vector machine (SVM) as a predictive module, and the working of the proposed diagnostic system (
Figure 1).
The first stage of the proposed diagnostic system is data preprocessing because data play a vital role in predictive ML models. The dataset is refined, standardized, and normalized. We deal with the missing values in the data preprocessing stage by employing K-nearest neighbors (KNN) imputation [
26]. This technique finds the K-items comparable (near) to the missing data. The KNN replaces the mean or most common value of K in the missing data with the most comparable neighbors. The selected dataset for the experiments portrays highly imbalanced classes. Hence, KNN imputation is employed independently on missing data from the majority and minority classes. Through this technique, the chance of infecting the minority class with data from the dominant class was reduced. Following the resolution of missing values, we performed the StandardScaler function on the selected dataset. The StandardScaler function helps to standardize a dataset by eliminating the mean and scaling to unit variance. A sample’s average score
is computed as follows:
where
denotes the mean of training samples and
is the standard deviation of the training samples. By calculating the relevant statistics on the training set samples, standardizing and scaling are performed independently on each feature. The mean and standard deviation are then saved for further use to transform on additional data.
When ML models are trained using all the feature space of a dataset, they tend to overfit, which means that ML models display improved performance on training data but poor performance on testing data [
27,
28]. This might be because the classifier learned superfluous or noisy features in the training data, or it could be due to a weak classifier with too many parameters. As a result, we should extract a subset of features from the dataset and a properly constructed classifier. In feature extraction methods, new features are constructed from the given dataset. Feature extraction decreases the resources necessary to explain a vast dataset. One of the primary issues with analyzing complex data is the number of variables involved. Analysis with many variables often necessitates a substantial memory and computing capacity. It may also lead a classification method to overfit training examples and generalize poorly to new samples. Feature extraction is a broad term encompassing ways of building variable combinations to avoid these issues while accurately summarizing the data. Many machine learning practitioners feel that well-optimized feature extraction is the key to good model design [
29]. Therefore, we proposed a novel feature extraction method (FEB) to avoid the problem of model overfitting while simultaneously reducing data dimensionality. Reduced data dimensionality increases the performance of the proposed SVM-FEB in terms of time complexity. In FEB, we cascaded different feature extraction methods (LDA, PCA, ICA, LLE, TSNE) into a single module. The four feature extraction methods (PCA, ICA, LLE, TSN) construct a couple of new features, while LDA constructs only one new feature. The newly extracted FEB features are combined to generate an optimum dataset with low dimensionality. The proposed FEB constructs nine new features from the original dataset, which consists of 75 features.
After the feature extraction stage, we divided the dataset into two parts; one for training and the other for the testing purposes of the proposed SVM-FEB model. The classes in the dataset for the experiments are highly imbalanced, which means that the model would be biased toward the majority class. To address this issue, we used the ADASYN approach to tackle the imbalanced class issue [
30]. The ADASYN approach uses a density distribution
as a criterion to automatically compute the number of synthetic samples necessary for minority data samples.
is a physical evaluation of the weight distribution of unique minority class instances depending on their level of learning difficulty. Following ADASYN, the final dataset will not only provide a balanced structure of classes in the data distribution (as defined by the coefficient) but also compels the learning algorithm to concentrate on complicated cases. As a consequence, the proposed system (FEB-SVM) is trained on balanced data, mitigating the risk of bias in the ML model. It is worth noting that the ADASYN approach is applied to training data following the data split. Suppose the ADASYN technique is used for the entire dataset (i.e., before data partitioning). In that case, the performance of the ML model will be skewed because samples from the testing dataset will also be included in the training dataset. Using ADASYN to balance the training dataset, we employed SVM for the classification task.
SVM is a powerful tool for classification and regression problems [
31]. SVM attempts to construct a hyperplane with the greatest possible margin. In the case of a classification problem, the hyperplane
, where
denotes the bias and
represents a weight vector that is built using training data and serves as a decision boundary for determining the class of a data point (a multidimensional feature vector). In the case of binary classification, SVM employs two support vectors and identifies the nearest vectors (data points) of two classes to create a margin. These vectors are referred to as support vectors. Margin is computed by taking the perpendicular distance between the lines going through the support vectors and multiplying by
. The primary objective is to develop an optimized SVM predictive model to provide an ideal hyperplane with the highest margin. SVM employs a set of slack variables known as
, i = 1, 2, …, ℧, as well as a penalty parameter known as
, and attempts to maximize
while minimizing misclassification errors. This fact is mathematically expressed as follows:
where
is the slack variable that is used to calibrate the degree of misdiagnosis and penalized factor is the Euclidean norm, also known as the L2-norm.
The major difficulty is that a linear hyperplane cannot correctly partition the binary classes’ data points (i.e., with the lowest classification error). For this reason, the SVM employs a kernel technique in which the SVM model converts local data points into hyperdimensional points to convert non-separable data points into separable data points. Different kernels are used, including the radial basis function (RBF) kernel, linear kernel, sigmoid kernel, and polynomial kernel. These kernels are SVM model hyperparameters that must be adjusted for each task. To design the SVM model that works best on a dementia prediction challenge, we must carefully update or optimize its hyperparameters. Grid search is the main way to reach this purpose. As a result, we employed the grid search approach to tweak the SVM hyperparameters. Consequently, in this paper, we suggest an FEB decrease in the data dimensionality. The suggested SVM-FEB approach dynamically optimizes the SVM model’s hyperparameters using the grid search method.