Enhancing Depression Detection: A Stacked Ensemble Model with Feature Selection and RF Feature Importance Analysis Using NHANES Data

Selvaraj, Annapoorani; Mohandoss, Lakshmi

doi:10.3390/app14167366

Open AccessArticle

Enhancing Depression Detection: A Stacked Ensemble Model with Feature Selection and RF Feature Importance Analysis Using NHANES Data

by

Annapoorani Selvaraj

^*

and

Lakshmi Mohandoss

Department of Data Science and Business Systems, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai 603203, Tamilnadu, India

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 7366; https://doi.org/10.3390/app14167366 (registering DOI)

Submission received: 3 July 2024 / Revised: 14 August 2024 / Accepted: 15 August 2024 / Published: 21 August 2024

(This article belongs to the Special Issue Data Analysis and Machine Learning in Epidemiology of Mental and Behavioral Disorders)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Around the world, 5% of adults suffer from depression, which is often inadequately treated. Depression is caused by a complex relationship of cultural, psychological, and physical factors. This growing issue has become a significant public health problem globally. Medical datasets often contain redundant characteristics, missing information, and high dimensionality. By using an iterative floating elimination feature selection algorithm and considering various factors, we can reduce the feature set and achieve optimized outcomes. The research utilizes the 36-Item Short Form Survey (SF-36) from the NHANES 2015–16 dataset, which categorizes data into seven groups relevant to quality of life and depression. This dataset presents a challenge due to its imbalance, with only 8.08% of individuals diagnosed with depression. The Depression Ensemble Stacking Generalization Model (DESGM) employs stratified k-fold cross-validation and oversampling for training data. DESGM enhances the classification performance of both base learners (linear support vector machine, perceptron, artificial neural network, linear discriminant analysis, and K-nearest neighbor) and meta-learners (logistic regression). The model achieved an F1 score of 0.9904 and an accuracy of 98.17%, with no instances of depression misdiagnosed.

Keywords:

major depressive disorder; stacked ensemble model; supervised learning; feature selection; feature importance

1. Introduction

According to the World Health Organization (WHO), depression will emerge as a leading contributor to disability by 2030. Depression affects 3.8% of the population, including 5% of adults (4% males, 6% women) and 5.7% of those over 60 years of age. About 280 million people worldwide suffer from depression [1].

In this population, depressive symptoms are prevalent and can impact patients’ health and quality of life. Major depressive disorder (MDD) is linked to considerable psychosocial dysfunction across various domains, including the individual, family members, and community levels [2]. The related study determined the prevalence of depression to be the highest among all disorders, both in terms of lifetime occurrence (3.14%) and within the past 12 months (1.7%). Using cross-cultural comparisons, researchers identify the relative importance of environmental influences in the emergence of depression. Researchers face two primary challenges when comparing mental health across different cultures. First, there is no universally agreed-upon definition of mental health symptoms. Second, there are insufficient methods for measuring the relevant cultural and social factors that influence mental health [3]. Major depressive disorder (MDD) is a mental health condition resulting from a complex interplay of environmental, economic, and cultural factors [4]. The complex manifestation of depression is partially realized due to the diverse range of depressive symptoms and their occurrence in various aspects of daily life. Numerous investigations conducted on people who had depression revealed a high prevalence of comorbidity between psychiatric and physical conditions, particularly in older adults who also had depression [5]. To diagnose depression, psychiatrists typically ask patients to complete questionnaires or forms. This information helps the psychiatrist understand the patient’s symptoms and make a diagnosis. Depression negatively affects both physical and mental health, leading to decreased activity levels and impaired work performance [6]. As a result of the lack of understanding, people who suffer from depression are sometimes met with dread and even mistrust, which puts them at risk of social isolation.

A collaborative effort by many researchers has yielded the development of a machine learning technology that can accurately identify subtle daily variations in individuals and other factors that may serve as direct or indirect indicators of advancing mental health [7]. The duration between the onset of symptoms and the initiation of treatment for a depressive condition had a typical duration of 2–3 months. Obtaining comprehensive statistics to accurately estimate the prevalence of depression is challenging due to the sensitive nature of this information. Therefore, the accessibility of data provides a significant concern [8].

Acknowledging the significance of several disciplines and methodologies, such as interactive research techniques, our objective is to identify and investigate the epidemiological, medical, public health, and socio-cultural factors that influence depression. While studies have examined the association between these conceptual categories and other persistent illnesses, emphasis is now being turned to machine learning-related learning in the relationship between mental illness and other risk factors. The Patient Health Questionnaire (PHQ-9) is an evaluation instrument that is utilized to evaluate the existence and intensity of depressive symptoms, which can be mild to severe [9].

National Health and Nutrition Examination Survey (NHANES) records and the incorporated PHQ-9 tool investigate depression and other health status, such as depression and low obesity [10]. The PHQ-9 was created to test for major depressive disorder. These scales have high psychometric features, including reliability over time, validity, and responsiveness to treatment change. They were initially developed in the primary care context and are well-established for use in psychiatric treatment-seeking groups [11].

Machine learning techniques have been developed for automatically classifying depression. These types of machine learning classification methods are confronted with a difficult class imbalance problem. Depression, like many other medical problems, is quite uncommon in the general population, which results in datasets that are frequently imbalanced [12]. The Short Form survey-36 is a widely used and well-studied instrument for measuring health outcomes and quality of life, based on self-reported data. Seven subcategories comprising important traits mostly associated with depression were reported.

This research develops an ensemble classifier using the SF-36’s psychological domain knowledge. The ensemble classifier is used to explore associations between depression and the following health-related aspects of the individual: social lifestyle, general, physical health status, pain, physical functionalities, mental health depression screener, and food behavior.

This work makes significant contributions in the following areas:

Identifying the primary socio-demographic and psychological determinants contributing to the development of depression.
This study aims to investigate several machine learning algorithms and feature selection techniques to detect the presence of depression effectively.
This study aims to facilitate the interpretation of high-risk groups of depressive disorders in primary care settings for primary physicians. Primary care clinicians will achieve this by applying the Depression Ensemble Stacking Generalization Model (DESGM) to identify and analyze the essential determinants associated with these high-risk groups.

The paper is structured as follows: Section 2 provides a summary of the research conducted on predicting depressive disorder and addressing imbalanced data in the field of mental health. Section 3 provides an overview of the data and methodologies employed in this research. The experimental results and discussion are depicted in Section 4, and the conclusion is presented in Section 5.

2. Related Work

The deep learning algorithm has reduced the feature selection method forecasting depression and it was not revealing the facts such as socio-lifestyle, economic, and demographics, and other risk factors to cause depression [13]. Predictions of depression can also be enhanced using unsupervised machine learning techniques, which help uncover hidden patterns and relationships within the data. A cluster solution is obtained using the self-organizing map, and the classification tasks are chosen from the clustered data to further evaluate the performance of the posterior probability multi-class SVM. A classification accuracy of 91.16% was achieved in this evaluation. Machine learning algorithms identify depressed groups by learning and recognizing underlying patterns in the data without predefined limits. “Graphing lifestyle-environs using machine-learning methods” (GLUMM) identified and characterized depression clusters using 96 factors from the NHANES (2009–2010). This study has faced the challenge of imbalanced data because the depressed participant’s percentage is lower than the whole survey data [14].

The clinical diagnosis process for MDD normally encompasses the utilization of an interview-based methodology, an assessment of physical health, and, in particular cases, the carrying is carried out. XGBoost utilizes significant biomarkers, like HDL cholesterol and hemoglobin, associated with depression to forecast instances in a balanced sample, employing both over-sampling and under-sampling techniques to address dataset imbalances [15]. The multi-task learning (MTL) approach organizes the risk factors correlated with depression, and it is adaptable for choosing the subgroup level risk factors that are income range, hours of computer usage in the last 30 days, ethnicity, and living in different states [16].

In recent years, medical researchers have worked diligently to identify factors that increase the risk of depressive disorders. They have combined traditional statistical methods, like regression analysis, with modern data mining techniques to achieve this goal [17]. The logistic regression analysis results revealed a statistically marginal link between depression and the diabetes status of the family members. Furthermore, the trend analysis demonstrated that households with more diabetic family members were more likely to experience depression [18].

The preceding investigation examined data from a sample of 582 lonely older adult females in South Korea to comprehend and mitigate depression. The presence of obesity at the initial assessment was found to be correlated with a higher likelihood of experiencing depression [19]. A nomogram identifies the causing factors of depression in individuals at a heightened risk of anxiety, individual health, n-6 fatty acid, n-3 fatty acid, daily lack of movement, and sleep duration [20]. Research has explored machine learning to analyze heterogeneous data and examine quality of life and depression. The secure hash algorithm with an unsupervised and supervised model identified depressive factors with a 91.16% accuracy [21].

Information about participants’ health, social life, and mental state was collected to predict depression. However, the quality of data collection has led to a large number of variables and challenges in selecting suitable analysis methods [14]. Research has investigated the relationship between psychosocial factors and mental health for earlier MDD diagnosis. The Short Form Survey-20 (SF-20) health-related quality of life has six subsets to analyze depression. However, the importance of features that are mostly related to depression was not reported [9]. The MRMR feature selection method identifies the sensitive features among 20 features (Table 1) and forecasts with a boosting mechanism. In this case, the feature subset is focused on only the CES-D (Center for Epidemiologic Studies Depression Scale) factor not considering the economic, general, health status, and socio-lifestyle [22].

The Korean Longitudinal Study of Aging (KLoSA) dataset (Table 1) uses a genetic algorithm (GA) to select the features and RF techniques to predict depression where class imbalance raises the error value that affects the prediction [23]. This study has some limitations in handling the missing values and it affects the predicting accuracy [25]. It manually selects features based on diabetes-related factors and socio-lifestyle and it is complex and time-consuming. The above-mentioned work has faced the challenge of the feature selection method not being properly evaluated in mental health survey data. The impact values correlated with depression for depression-causing risk factors selected based on previous research studies are not provided [27].

A recent systematic review [25,26] found just a few experiments utilizing stacked ensemble classifiers, feature selection, and feature importance on survey questions, but none that used National Health and Nutrition Examination Survey (NHANES) data. This dataset is mostly utilized for public health–psychological research. However, it is worth noting that the survey includes significant health-related characteristics that have the potential to impact mental well-being. According to the authors’ understanding, this study is the first attempt to apply feature selection and importance using NHANES data.

2.1. Risk Factors

The examination of depression in adults across all age groups revealed an increase in moderate depression among those in the 20–39 age group [28]. Depression was the most common of the specific mental illnesses, and there was a stronger correlation with drug use problems [29].

The factors of marital status, finances, education level, current health status, and age group consistently demonstrated a substantial correlation with MDD in high-income countries, such as the United States [30]. MDD also contributes to suicide and heart disease, increasing the overall burden of depressive disorders [31]. Factors contributing to depression include recent weight loss, sleep duration, reduced food consumption, and problems at work [32]. This research examined various factors, including smoking, eating habits, security, a low-income ratio, body mass index, active lifestyles, alcohol consumption, health problems, and medications [33,34]. The projected estimations provided by a cohort primarily consisting of individuals with connections to sleep disorders, insufficient physical exercise and activity, occupational stress, obesity, concerns about weight loss, and unhealthy eating habits suggest a potential correlation with depression [22].

2.2. Ensemble Learning Techniques

Ensemble classifiers have shown promise of correct diagnosis, particularly when employing many features. A related study utilized the majority vote method for classification and regression [35]. Ensemble learning is a commonly employed technique that integrates data-level and algorithmic-level approaches to address the issue of imbalanced data [36]. In this study, various models, including gradient boosting machine (GBM), KNN, regularized greedy forest, and logistic regression, were employed to forecast the occurrence of depression among elderly person individuals [37].

Voting Classifier: One of the most straightforward ensemble learning strategies is the voting classifier [38], which combines the results of multiple base classifiers. There is some flexibility in the composition of the classifiers. The majority vote of the classifiers determines the sample’s label in the hard-voting method.

y_{p} = m a x \sum_{i = 1}^{z} p_{i k}

(1)

where

y_{p}

is a predicted class label, z is the number of base learners, and

p_{i k}

is the probability of the i-th label from the k-th classifier.

Stacking: Combining a base model (a single predictive model) with a meta-model [39] would take advantage of the best features of each model while mitigating their worst flaws, hence improving the resultant model’s reliability. In addition, a stacked ensemble has been frequently used for categorizing and creating predictive machine learning models to enhance generalization ability.

Boosting: The primary objective is to enhance the speed and accuracy of competitive efforts. XGBoost employs a boosting strategy to improve accuracy by combining weak classifiers. Currently, XGBoost is used for imputing missing values and is recognized for its exceptional enhancement in predictive accuracy. With the help of XGBoost, depression cases may be predicted based on many balanced samples acquired using different resampling techniques.

3. Proposed Method—Depression Ensemble Stacking Generalization Model (DESGM)

3.1. Data Source Representation

This study utilized data collected throughout the period spanning from 2015 to 2016 [39]. The NHANES study obtained ethical approval from the Research Ethics Review Board of the National Centre for Health Statistics, and everyone who participated provided written consent. NHANES is a comprehensive program that integrates interviews, physical examinations, and laboratory tests to evaluate the health status and identify possible health risks for individuals of all ages in the United States.

3.2. Data Pre-Processing

The survey questions within the NHANES dataset are organized in a columnar format, with participants being assigned to rows. These rows are further categorized into several tables representing various health domains. Pre-processing is necessary for classification in the experiment due to the need for uniform format and structure among the tables. Only the survey questions component was utilized, constituting one-third of the dataset.

The work compiled the necessary data into a single file. We merged all topic-based files for each cycle and selected items (or attributes) pertinent to the area of interest (i.e., depression disorder).

{S F}_{o}

refers to a file that contains all 40 topic-based files from the 2015–16 cycle using the supplied sequence ID.

Let $A = \emptyset$ .
For each item f in ${S F}_{o}$ .
- If f may indicate a symptom (e.g., trouble sleeping or sleeping too much), then add f to A.
Let $S_{o}$ denote the projection of ${S F}_{o}$ onto A.

Normalization was additionally required to standardize the scale condition across different questions. Categorical features were encoded with numerical values for analysis. The dataset underwent additional examination to identify missing values within the variables. To improve data quality, variables with more than 40% missing information were removed from the dataset. Given the common occurrence of incomplete data in real-world studies, the expectation maximization (EM) algorithm was used to estimate missing values. This algorithm identifies the most likely values for unknown data points by iteratively refining estimates. After applying the EM method to fill in missing information, the initial 189 variables from the NHANES 2015–16 dataset were reduced to 110 distinct features. Following the pre-processing stage (Table 2), a comprehensive dataset was generated, comprising 5134 individuals.

3.3. Feature Selection: Iterative with Floating Elimination Algorithm (IFEA)

Clinical medical datasets often encounter the challenge of the curse of dimensionality, which refers to the presence of unnecessary and duplicated variables accompanied by ambiguity. The attribute selection process is important in studying medical and healthcare data [8,20,21]. This study assessed the feature dependence of the models for depression prediction to develop an accurate model depending on a constrained set of available features. The selected data files encompass various attributes of individuals’ quality of life (QoL). These features included physical fitness and activities, smoking and alcohol consumption habits, current residential and employment status, monthly income and expenses, general health conditions, and dietary intake. After eliminating several unrelated primary inquiries, a total of 110 features were selected, which was denoted as the number of desired features, d.

Feature selection is a prevalent issue encountered in various machine learning applications, encompassing classification, regression, and other related domains. The effectiveness of an inference model can be enhanced by employing an appropriate feature selection technique [39]. Feature subset selection is a technique that aims to identify the optimal subset of features based on the classifier’s accuracy on the training data. The approach is a sequential process that commences with a comprehensive dataset and progressively removes variables from the DESGM model at each phase, ultimately identifying a reduced model that offers the most optimal explanation for the data. Backward feature selection starts with all available features. In each step, the Algorithm 1 (IFEA) evaluates the importance of each feature based on a specific criterion. The least important feature is then removed from the set, and the process is repeated until a desired number of features remains.

This process continues until the required dimensionality is attained. It is crucial to note that the floating step is conditional and only occurs if the resulting feature subset is rated as “better” by the criterion function after adding a specific feature. This strategy consistently ensures the acquisition of the optimal subset of features, exhibiting superior performance compared to alternative search strategies. The objective is to identify any features that have been eliminated that, when reintroduced into the feature subset, enhance the performance of the classifier in step 4. If such features are present, the feature

x^{+}

that optimizes performance enhancement is incorporated. If an improvement cannot be produced, specifically if feature

x^{+}

cannot be found, return to step 3.

Algorithm 1: Iterative with Floating Elimination Algorithm (IFEA)
Input: Y = {y₁, y₂ … y_k}//Y—set of all features
Output: X_d = {xi\i = 1, 2, …, d; x_i $\in$ Y} where d = {1, 2 … k} d is the number of desired features
1.	Initialize X₀ = Y
2.	for i = 1 → k–d:
		$x^{-} = a r g m a x C (X_{d} - x_{i}), w h e r e x_{i} \in X_{d}$ // $x^{-}$ maximizes criterion function upon removal
		$X_{d - 1} = X_{d} - x^{-}$ //Remove feature $x^{-}$ from the feature subset $X_{d}$
		$k = k - 1$
	End for
3.	If (X_d $< m i n i m a l n o o f d e s i r e d f e a t u r e s) :$
		Go to step 4
	End if
4.	$x^{+} = a r g m a x C (X_{d} + x), w h e r e x \in Y - X_{d}$ $I f (X_{d} + d) > C (X_{d}) :$
		$X_{d + 1} = X_{d} + x^{+}$
		$k = k + 1$
		Go to step 3
	End if
5.	End

3.4. Stacked Ensemble Model—DESGM

The idea of meta-level-based learning is used in stacking ensemble learning. As shown in Figure 1, various ML methods are used to generate the set of base learners. Each primary classifier’s hyperparameter can be optimized to increase accuracy, and diversity can be attained through various techniques, such as various training samples, sets of features, and factors.

Base Learners

The DESGM model employs various supervised learning techniques to classify patients who are at risk. The approach utilizes both training and testing data to construct a model capable of predicting the appropriate output label for new observations.

Perceptron:

The perceptron is the most basic type of neural network and is used to categorize patterns that are claimed to be linearly separable. A single neuron-based perceptron can only conduct pattern categorization with two classes. The perceptron algorithm follows the steps below.

Input: X represents a set of m training samples, each denoted as

{(x_{1}, x_{2}, \dots, x_{m})}^{T}

.

Initialize w = 0.
For each training sample $X_{i}$ , the output value ${\hat{y}}^{(i)}$ is computed using the following activation function:

${\hat{y}}^{(i)} (: = Φ (w^{T} x^{(i)}))$

(2)
Update the weights.
The weight vector w is then updated based on the computed output and the actual class label ${\hat{y}}^{(i)}$ of the ith training sample using the update rule:

w : = w + Δ w, Δ w = η (y^{(i)} - {\hat{y}}^{(i)}) x^{(i)}

(3)

where

η

is the learning rate,

0 < η < 1

. The activation function Φ(z) is defined in the following equation:

Φ (z) = {1 i f z \geq 0, - 1 o t h e r w i s e, z = w^{T} = w_{0} + w_{1} x_{1} + w_{2} x_{2} + \dots + w_{m} x_{m}

(4)

where

w_{0}

functions as a bias term, and the binary classifier uses the values of 1 for the depression class and −1 for the non-depression type. The model’s ability to generalize to larger sets of data is enhanced by including the bias term, which narrows in on particular traits.

3.5. Artificial Neural Network (ANN)

The capacity of artificial neural networks (ANNs) to acquire knowledge and make inferences from non-linear patterns in data makes them highly suitable for application in domains like health-related prediction. It comprises a series of interconnected processing components known as neurons. The neurons exhibit a logical arrangement, typically organized into multiple layers, and communicate through weighted connections. It consists of an input layer, responsible for receiving and presenting data to the network, and an output layer, which captures the network’s reaction to the input. The nodes within the hidden layer act as intermediary variables, enabling the ANN to capture intricate non-linear relationships between the input data and the outcome. The current research uses hyperparameters, specifically a hidden layer with a value of 2 and a decay weight of 0.09.

3.6. K-Nearest Neighbor (KNN)

The KNN technique, known for its non-parametric characteristics, is widely employed in addressing classification tasks based on distance metrics. The KNN algorithm, being an instance-based learning approach, classifies a new instance by evaluating its similarity with the most similar samples in the training set. The estimation of similarity among instances can be achieved through the utilization of distance metrics. KNN uses Manhattan, Minkowski, Hamming, Euclidean, and other distance metrics. Finding the proper value for k is essential to balance overfitting and underfitting the data.

The set χ is equipped with a metric function p. Let p: χ × χ → ℝ be a function that computes the distance between any two entries in the set χ. In this study, the Minkowski distance function was applied.

d (x, y) = ||x - y|| ≝ {(\sum_{i = 1}^{m} {|x_{i} - y_{i}|}^{p})}^{\frac{1}{p}}

(5)

Consider a sequence of training instances denoted as

S = \{(x_{1}, y_{1}), \dots, (x_{m}, y_{m})\}

. For any element

x

belonging to the set χ, define

π_{1} (x), \dots, π_{m} (x)

as a permutation of {1, …, m} based on their distances to x, denoted as d(x, y), and the majority label among them.

h_{s} (x) = \{y π_{i} (x) : i \leq k\}

(6)

3.7. Linear Support Vector Machine (LSVM)

The LSVM algorithmic method addresses samples’ complexity by seeking separators with a “large margin”. The justification for employing decision boundaries with significant margins is rooted in their tendency to exhibit a lower generalization error, unlike models with narrow margins. The following procedures will be used to train and classify the SVM:

(a)

Training LSVM:

Compute the gram matrix.

$G_{i j} = y^{(i)} y^{(j)} x^{(i)} x^{(j)}$

(7)
Let $α^{*}$ value:

$\max_{\propto} [\sum_{i = 0}^{N} α_{i} - \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} {α_{i} α_{j} y}^{(i)} y^{(j)} x^{(i)} x^{(j)}] S u b j e c t t o \{\begin{array}{l} 0 \leq α_{i} \leq C, \forall_{i,} \\ \sum_{i = 1}^{N} α_{i} y^{(i)} = 0 \end{array}$

(8)

The value of parameter C controls the margin’s width.
Determine the weight.

$w^{*} = \sum_{i = 1}^{N} {{α^{*}}_{i} y}^{(i)} x^{(i)}$

(9)
Calculate the intercept.

${w_{0}}^{*} = 1 - \min_{i : y^{(i)} = 1} w^{* T} x^{(i)}$

(10)

(b)

Classification LSVM (For a new sample x)

Determine the support vector’s $x^{(i)}$ .

$k_{i} = x \cdot x^{(i)}$

(11)
Calculate f(x).

${f (x) = w_{0}}^{*} + \sum_{i} {{α^{*}}_{i} y}^{(i)} k_{i} (: {w_{0}}^{*} + w^{* T} x)$

(12)
Examine sign (f(x)).

3.8. Linear Discriminant Analysis (LDA)

LDA is a widely employed method in the field of machine learning and data analysis for the purpose of reducing the dimensionality of a dataset. The objective is to determine an optimal linear transformation that maximizes the degree of separation between distinct classes. The above scenario states that each class consists of

N_{k}

samples in the real d-dimensional space, where k represents the class index ranging from 1 to c.

Let $X_{k}$ be the collection of d-dimensional samples for class k, represented as ${x^{(1)}, x^{(2)}, \dots, x^{(N_{k})}}$ .
Let $X \in R^{d \times N}$ be the data matrix, stacking all the samples from all classes, with each column representing a sample, where $N = \sum_{k} N_{k}$ .
Let us specify how samples x are transformed onto a line in [(c 1)-space, for c = 2]. $z = w^{T} x = w \cdot x, s u b j e c t t o w \in R^{d}$ is a projected vector.

Hence, the LDA algorithm aims to identify a linear projection that minimizes the distance between samples belonging to the same class while maximizing the separation between the projected class means.

Meta-Learner—Logistic Regression (LR)

LR maximizes model stability while increasing the trustworthiness of the foundational paradigm. Multiple researchers utilized it as the linear meta-learner within stacked ensembles. Based on the literature study, it has been observed that LR is the most widely used meta-learner in the context of the stacked ensemble for classification with imbalances [40]. It estimates the logit of the probabilities associated with each possible combination of predictors and responses. Predictive modeling uses this property considerably by calculating the mathematical possibility that a given instance is depressed or not depressed.

l o g i t (P_{y}) = {l o g}_{e} (\frac{P_{y}}{1 - P_{y}}) = \sum_{i = 0}^{n} α_{i} X_{i}

(13)

The variable

P_{y}

represents the likelihood of Y being equal to 1, corresponding to depression. Conversely,

1 - P_{y}

means the chance of Y being equal to 0, indicating the not depressed class. The unknown regression coefficients, denoted as

α_{i}

(where i ranges from 1 to k), are the parameters to be estimated. In this context, k represents the total number of predictors, precisely 62 factors. The predictors themselves are denoted as

X_{i}

(where i ranges from 1 to k), with

X_{0}

equal to 1. The values of the regression coefficients can be determined from the maximum likelihood of the factor using the given equation.

Φ (x) = P = \frac{1}{1 + e^{- x}}

(14)

The above formula is employed for the purpose of mapping the outcomes within the range from 0 and 1. It involves assigning weights to each feature of a sample, combining the weighted outcomes with Φ(x), and determining the sample’s class based on the magnitude of the result in comparison to a predefined threshold.

3.9. Depression Ensemble Stacking Generalization Model (DESGM)

The results of multiple studies have proven that the stacked ensemble model has exceptional accuracy due to its ability to mitigate the potential issue of overfitting, which is a drawback associated with individual prediction models [20]. Initially, a series of distinct deep and machine learning models was trained on the training data, referred to as level 0. In other words, when the data passed through several machine learning algorithms, biases were minimized, and the most important features were also retrieved and compressed.

To be able to make the most accurate prediction, which is referred to as level 1, the results from all of the trained models were eventually merged and input into a single meta-learning model. At this stage, Figure 2 represents the ultimate meta-model that can assign various weights to the outputs of each well-trained model from level 0. Therefore, it has the possibility of providing improved performance in the overall prediction. Algorithm 2 employs a stratified k-fold cross-validation to verify that each fold’s training and test data accurately represent the imbalanced distribution. Oversampling is a technique used to balance the dataset by increasing the number of instances in the smaller class. In this study, oversampling was applied exclusively to the training data portion during each cross-validation iteration. The stacking technique involves aggregating the results of multiple base classifiers using one meta classifier [41]. The stacking process consists of creating a meta dataset with equivalent instances to the original dataset. However, rather than applying the initial input attributes, the model employs the outputs of the base models as the input parameters. The projected feature remains unchanged from its original state in the dataset.

The selection of the base model and meta-model is critical in the DESGM approach. As previously indicated, this study selected five base models for the ensemble model shown in Figure 2, including LSVM, ANN, KNN, LDA, and the perceptron method. The meta-learner employed in this study was LR. This method uses a meta-learner to determine base model reliability.

Algorithm 2: Depression Ensemble Stacking Generalization Algorithm (DESGM)
Input: Healthcare Survey Data $H_{0}$ Dataset $S_{0} = \{x_{1}, x_{2}, \dots, x_{n}, y\}, x_{i} ϵ R^{m}, y ϵ {0,1}^{m}$ B-base learners [LSVM, LDA, ANN, KNN, Perceptron], each individual classifier is optimized for each sub-dataset, Meta-learner LR
Output: An Optimized stacking ensemble classifier H
1	Using the questionnaire data $H_{0}$ as the base, construct the dataset $S_{0}$ .
2	Aggregate features about seven psychological functions. $S_{0} = S_{s} \cup S_{d s} \cup S_{g} \cup S_{p a} \cup S_{p h} \cup S_{p h_a c t} \cup S_{f}$
3	Feature selection using SBFS(A_i).
	for algorithm A_i in $H_{0}$
		return the sorted feature subset $(S_{s} {, S}_{d s,} S_{g,} S_{p a}, S_{p h,} S_{p h_a c t}, S_{f}$ )
	end for
4	Using a 5-fold cross-validation procedure, randomly divide the dataset $S_{i} \in S_{0}$ into k subgroups of equal size.
		for each sub-dataset $S_{i} i n {S_{s} {, S}_{d s,} S_{g,} S_{p a}, S_{p h,} S_{p h_a c t}, S_{f}}$ do
			For k = 1 to K do
				//4.1 Classifiers of level 0 should be learned.
				For t = 1 to T, do
					Construct a classifier using $S_{i} / S_{k}$
				end for
				//4.2 Design a training set at the level-1
				For $X_{i} ϵ S_{k}$ do
					$S B = []$
					$S_{n +}$ = Create a new instance $\{{x_{i}}^{'}, y_{i}\}, {x_{i}}^{'} = {h_{k 1} (x_{i}), h_{k 1} (x_{i}), \dots, h_{k T}$
				end for
			end for
		end
5	Concatenate the SB produced by all base classifiers.
6	//Learn a classifier at level 2.
	Construct a new classifier h’ using the obtained SB//LR data.
7	Rebuild all of the level-1 classifier.
	For t = 1 to 5, do
		Develop the classification model $h_{t}$ using $S_{t}$
	end for
return $H (x) = h^{'} (h_{1} (x), h_{2} (x), \dots, h_{5} (x))$

DESGM can be generated with a well-pre-processed dataset consisting of m samples and n features, denoted as

S = (x_{1}, x_{2}, \dots, x_{n}, y)

, where

x_{i} \in R^{m}

and

y \in {0,1}^{m}

. It is possible to create a stacking ensemble model that accurately maps the features

{x_{1}, x_{2}, \dots, x_{n}}

to the target variable y. This may be achieved by using h different types of baseline models within the stacking ensemble model H(x).

To assess our model’s effectiveness and address the issue of imbalanced data, we employed k-fold cross-validation. This method divides the dataset into k equal-sized groups (Figure 2). The model is trained on k − 1 groups and tested on the remaining group. This process is repeated k times, with each group serving as the test set once. This ensures that the model encounters a balanced representation of both common and rare cases, enhancing its ability to make accurate predictions. For this study, we used k values between 5 and 10.

4. Results and Discussions

The objective of this research effort is to deploy prominent ensemble learning techniques to evaluate data and investigate the lifestyle of individuals to identify the elements contributing to the onset of depression.

4.1. Performance Metrics

The evaluation of predictive performance in this study was conducted using metrics such as F1 score, accuracy, recall, and precision.

The equations representing each evaluation index are shown below.

R e c a l l (R) = \frac{T r u e P o s i t i v e}{(T r u e P o s i t i v e + F a l s e N e g a t i v e)}

(15)

P r e c i s i o n (P) = \frac{T r u e P o s i t i v e}{(T r u e P o s i t i v e + F a l s e P o s i t i v e)}

(16)

A c c u r a c y (A C C) = \frac{T P + T N}{T P + T N + F P + F N}

(17)

F 1 s c o r e = 2 * \frac{P * R}{P + R}

(18)

4.2. Results of Feature Selection Method

Features were selected to represent social lifestyle factors and those linked to depression diagnosis. From an initial set of 110 features, the IFEA method identified the most relevant ones. Three feature selection methods were compared: IFEA, GA, and MRMR (Table 3). By varying the number of features in each method, the optimal result was achieved with 62 features (Table 4). The general category encompasses personal details like gender, age, education, and marital status. This analysis reveals how different feature sets impact accuracy by adding or removing specific characteristics.

4.3. Results of Ensemble Model

It is expected that ensemble classifiers will improve the classification performance. This study analyzed the most prevalent ensemble strategies, including voting, boosting, and the proposed DESGM. A variety of ensemble classifiers were evaluated, encompassing homogeneous classifiers such as boosting and heterogeneous classifiers such as voting. Not all ensemble classifiers exhibit an enhancement in their ability to predict mortality. In Figure 3, the voting classifiers showed improved results in the overall physical activity, general, and depression screener datasets. The boosting classifier achieved its highest performance when applied to the general and comprehensive datasets, resulting in an F1 score of 0.9612. The overall performance gained through boosting did not surpass that of a single classifier. This can be explained by the fact that boosting employs a random selection of feature subsets at each branch, which might need more capability to impact the predictive abilities of the models significantly.

4.4. Results of DESGM

The proposed DESGM model integrates the decision outputs of various classification methods. Each of the sub-models makes an equal contribution to the ultimate aggregated prediction. The proposed model yielded the best results, according to the evaluation of the output of this customized ensemble classifier with that of all the base classifiers.

A common practice among stacking classifiers is to employ logistic regression (LR) as the meta-classifier. Furthermore, the testing performance of each model, as shown by the F1 score, precision, recall, and accuracy, is provided in Figure 4 and presents a comparative analysis of the proposed model alongside two other ensemble models, focusing on metrics such as accuracy and other relevant testing measures. Compared to the conventional ensemble model, the proposed model demonstrated improved accuracy by 0.026 and F1 score by 0.02. When comparing the deep learning model to the proposed DESGM, it is seen that the ensemble model demonstrates outstanding precision and other notable achievements with a margin of 0.06. The DESGM performed excellently compared to the individual base classifiers and different ensemble classifiers, as seen in (Figure 4) with its high accuracy of 0.9817 and F1 score of 0.9904. Therefore, it can be inferred that the stacking ensemble classifier (Table 5) exhibits a substantial performance improvement compared to other classifiers when applied to the NHANES dataset.

In Figure 5a, the area under the curve (AUC) is the best way to measure how accurate a depression diagnosis is. The receiver operating characteristic (ROC) graph displays that the ensemble voting AUC increases a little when the cut-off values are changed. This, in turn, changes the number of patients who are identified as having the disease. Even though XG boosting makes a considerable difference in the number of true positives, it still does not reach an AUC of 1.0. On the other hand, the proposed method DESGM obtains a true positive rate of 1, which means that no cases of depression are mistakenly identified, as shown in Figure 6. To provide a comprehensive evaluation of the model’s performance, we included metrics that are sensitive to class imbalance, such as the F1 score, precision, recall, and the area under the ROC curve. These metrics offer a balanced perspective on how well the model performs on both classes, beyond the overall accuracy.

4.5. Role of Feature Importance—Random Forest (RF)

Recently, variable importance measures derived from random forests have been proposed for the selection of essential variables for prediction in the analysis of medical data. In this case, one of the main benefits of the random forest’s variable importance measures, compared to single screening methods, is that it considers the effects of each predictor variable on its own and how they interact with each other. Tree-based approaches like random forests can help find important predictor factors even in high-dimensional situations with many varied interactions. The PHQ-9 depression screening question does not evaluate the significance of this feature. The characteristics listed in Figure 7 have the strongest association with depression.

4.6. Discussion

Machine learning approaches are employed to elucidate the correlation between depression and the quality of life (QoL) characteristics that contribute to the onset of depression. XGBoost utilizes many strategies to effectively manage overfitting, identify optimal splits, and handle missing values during the training phase. The variables responsible for causing depression are the results of expectations of the feature importance. The importance ranking (Figure 5) provided in this study primarily includes estimates of sleeping difficulties, lack of exercise and physical activities, overweight, perception of weight loss, and eating behavior. Prior research provides evidence of an association between mental problems and the usage of nicotine [42].

This study identified a link between lower monthly family income and the occurrence of depression, establishing that lower income is a contributing factor to the development of mood disorders. The study’s primary findings indicate that individuals’ regular sleep and waking timelines, as well as difficulties in both falling asleep and excessive sleep duration, were key factors associated with sleeping problems [43]. According to previous studies [44,45,46], it has been demonstrated that sleep disturbances are a significant contributing factor to the development of depression. The inclusion of extended periods of active video gameplay was also identified as one of the outcomes. It is hypothesized that an absence of vigorous physical activity and engagement in idle pursuits may contribute to depression. The perception of weight loss was also anticipated to be a contributing factor to the development of depression. Conversely, empirical evidence from a study demonstrates that those who adhere to a nutritious diet exhibit a significantly lower likelihood of experiencing depression [47]. Consuming nutritionally deficient meals can potentially be regarded as an indication of depressive symptoms. Among individuals diagnosed with alcohol dependency, 29% experienced at least one mood disorder within a year. Notably, the most prevalent affective disease among these individuals was significant depression, reported by 28% of the respondents. Depression is an important issue that holds considerable importance in understanding the gender disparity in depression, particularly among women. The investigation’s findings confirm a significant correlation between financial hardship, encompassing factors such as monthly income, wages, and other financial resources, and the presence of depression. A consistent association exists between depression and an elevated risk of developing coronary heart disease. Depression is almost twice as high in women as in males, and it has demonstrated strong links with coronary heart disease.

5. Conclusions and Future Work

The present study, utilizing data from the NHANES, reveals a significant association between symptoms of depression and several parameters about an individual’s quality of life. To more effectively analyze the diverse range of data on mental health issues, it is proposed to employ a grouping methodology with the SF-36 instrument, which measures quality of life. In the experimental review conducted with the NHANES 2015–16 dataset, less than 2% of cases were incorrectly classified as belonging to the depressed class, whereas no instances of depression were inaccurately identified.

DESGM employs stratified k-fold cross-validation to solve class imbalance problems with below 10% of depressed label data in the NHANES dataset. Being iterative with a floating elimination feature selection algorithm and the consideration of different factors make it possible to decrease the feature set and achieve an optimized outcome effectively. Additionally, the assessment of feature relevance is a crucial component of this methodology. The random forest (RF) algorithm assigns the highest level of significance to the factors that contribute to the occurrence of depression. The present investigation identified a correlation between the anticipated variables about quality of life and depression. The proposed methodology’s findings demonstrate a higher reliability level when compared to the most prominent state-of-the-art approaches and previously published research. Future work focuses on the different country datasets and depression level identification using deep learning methodology.

Author Contributions

Conceptualization, data collection, analysis and interpretation of the results, and draft manuscript preparation: A.S.; software validation, investigation, and supervision: L.M. All authors have read and agreed to the published version of the manuscript.

Funding

The authors received no specific funding for this study.

Institutional Review Board Statement

The NHANES dataset was collected with IRB approval and documented participant consent. (Source: https://www.cdc.gov/nchs/nhanes/irba98.htm, accessed on 1 August 2024).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets can be accessed from the website of the Centers for Disease Control and Prevention at https://www.cdc.gov, accessed on 1 August 2024. It is available online via the following link: https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Questionnaire&CycleBeginYear=2015, accessed on 1 August 2024.

Conflicts of Interest

The authors declare that they have no conflicts of interest to report regarding the present study.

References

Institute of Health Metrics and Evaluation. Global Health Data Exchange (GHDx). Available online: https://vizhub.healthdata.org/gbd-results/ (accessed on 4 March 2023).
Chakrabarty, T.; Sarfati, D.; Lam, R.W. Chapter 24—Neurobiological Aspects of Functional Recovery in Major Depressive Disorder. In Neurobiology of Depression; Academic Press: Cambridge, MA, USA, 2019; pp. 277–284. [Google Scholar] [CrossRef]
Murthy, R.S. National Mental Health Survey of India 2015–2016. Indian J. Psychiatry 2017, 59, 21–26. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Kessler, R.C.; Aguilar-Gaxiola, S.; Alonso, J.; Chatterji, S.; Lee, S.; Ormel, J.; Üstün, T.B.; Wang, P.S. The global burden of mental disorders: An update from the WHO World Mental Health (WMH) surveys. Epidemiol. Psichiatr. Soc. 2009, 18, 23–33. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Sagar, R.S.; Mohan, D.; Kumar, V.; Khandelwal, S.K.; Nair, P.G. Physical illnesses among elderly psychiatric out-patients with depression. Indian J. Psychiatry 1992, 34, 41–45. [Google Scholar] [PubMed] [PubMed Central]
Nguyen, D.-K.; Chan, C.-L.; Li, A.-H.A.; Phan, D.-V. Deep Stacked Generalization Ensemble Learning models in early diagnosis of Depression illness from wearable devices data. In Proceedings of the 5th International Conference on Medical and Health Informatics (ICMHI ‘21), Association for Computing Machinery, New York, NY, USA, 7–12 May 2021. [Google Scholar] [CrossRef]
Bohr, A.; Memarzadeh, K. The rise of artificial intelligence in healthcare applications. In Artificial Intelligence in Healthcare; Academic Press: Cambridge, MA, USA, 2020; pp. 25–60. [Google Scholar] [CrossRef] [PubMed Central]
Aleem, S.; Huda, N.U.; Amin, R.; Khalid, S.; Alshamrani, S.S.; Alshehri, A. Machine Learning Algorithms for Depression: Diagnosis, Insights, and Research Directions. Electronics 2022, 11, 1111. [Google Scholar] [CrossRef]
Tao, X.; Chi, O.; Delaney, P.J.; Li, L.; Huang, J. Detecting depression using an ensemble classifier based on Quality of Life scales. Brain Inf. 2021, 8, 2198–4026. [Google Scholar] [CrossRef]
Merikangas, A.K.; Mendola, P.; Pastor, P.N.; Reuben, C.A.; Cleary, S.D. The association between major depressive disorder and obesity in US adolescents: Results from the 2001–2004 National Health and Nutrition Examination Survey. J. Behav. Med. 2011, 35, 149–154. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Kroenke, K.; Spitzer, R.L.; Williams, J.B.; Löwe, B. The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: A systematic review. Gen. Hosp. Psychiatry 2010, 32, 345–359. [Google Scholar] [CrossRef] [PubMed]
Gerych, W.; Agu, E.; Rundensteiner, E. Classifying depression in imbalanced datasets using an autoencoder-based anomaly detection approach. In Proceedings of the 2019 IEEE 13th International Conference on Semantic Computing (ICSC), Newport Beach, CA, USA, 30 January–1 February 2019; pp. 124–127. [Google Scholar]
Oh, J.; Yun, K.; Maoz, U.; Kim, T.S.; Chae, J.H. Identifying depression in the National Health and Nutrition Examination Survey data using a deep learning algorithm. J. Affect. Disord. 2019, 257, 623–631. [Google Scholar] [CrossRef] [PubMed]
Dipnall, J.F.; Pasco, J.A.; Berk, M.; Williams, L.J.; Dodd, S.; Jacka, F.N.; Meyer, D. Why so GLUMM? Detecting depression clusters through graphing lifestyle-environs using machine-learning methods (GLUMM). Eur. Psychiatry 2017, 39, 40–50. [Google Scholar] [CrossRef] [PubMed]
Sharma, A.; Verbeke, W.J.M.I. Improving Diagnosis of Depression with XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset (n = 11,081). Front. Big Data 2020, 3, 15. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Wang, L.; Chignell, M.; Jiang, H.; Lokuge, S.; Mason, G.; Fotinos, K.; Katzman, M. Prioritization of Multi-level Risk Factors, and Predicting Changes in Depression Ratings after Treatment Using Multi-Task Learning. In Proceedings of the 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Houston, TX, USA, 9–12 December 2021; pp. 3239–3244. [Google Scholar] [CrossRef]
Wang, L.; Zhu, D.; Towner, E.; Dong, M. Obesity risk factors ranking using multi-task learning. In Proceedings of the 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), Las Vegas, NV, USA, 4–7 March 2018; pp. 385–388. [Google Scholar]
Jayawickreme, N.; Atefi, E.; Jayawickreme, E.; Qin, J.; Gandomi, A.H. Association Rule Learning Is an Easy and Efficient Method for Identifying Profiles of Traumas and Stressors that Predict Psychopathology in Disaster Survivors: The Example of Sri Lanka. Int. J. Environ. Res. Public. Health 2020, 17, 2850. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Jia, Z.; Li, X.; Yuan, X.; Zhang, B.; Liu, Y.; Zhao, J.; Li, S. Depression is associated with diabetes status of family members: NHANES (1999–2016). J. Affect. Disord. 2019, 249, 121–126. [Google Scholar] [CrossRef] [PubMed]
Byeon, H. Developing a Predictive Model for Depressive Disorders Using Stacking Ensemble and Naive Bayesian Nomogram: Using Samples Representing South Korea. Front. Psychiatry 2022, 12, 773290. [Google Scholar] [CrossRef] [PubMed]
Habib, M.; Wang, Z.; Qiu, S.; Zhao, H.; Murthy, A.S. Machine Learning Based Healthcare System for Investigating the Association Between Depression and Quality of Life. IEEE J. Biomed. Health Inform. 2022, 26, 2008–2019. [Google Scholar] [CrossRef] [PubMed]
Thanathamathee, P. Boosting with feature selection technique for screening and predicting adolescents depression. In Proceedings of the 2014 Fourth International Conference on Digital Information and Communication Technology and its Applications (DICTAP), Bangkok, Thailand, 6–8 May 2014; pp. 23–27. [Google Scholar] [CrossRef]
Lee, S.-J.; Moon, H.-J.; Kim, D.-J.; Yoon, Y. Genetic algorithm-based feature selection for depression scale prediction. In Proceedings of the GECCO ‘19: Genetic and Evolutionary Computation Conference, Prague, Czech Republic, 13–17 July 2019; pp. 65–66. [Google Scholar] [CrossRef]
Santana, R. Genetic Algorithms for Feature Selection in the Children and Adolescents Depression Context. In Proceedings of the 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 1470–1475. [Google Scholar] [CrossRef]
Yu, B.; Zhang, X.; Wang, C.; Sun, M.; Jin, L.; Liu, X. Trends in depression among Adults in the United States, NHANES 2005–2016. J. Affect. Disord. 2019, 263, 609–620, ISSN 0165-0327. [Google Scholar] [CrossRef]
Thieme, A.; Belgrave, D.; Doherty, G. Machine learning in mental health: A systematic review of the hci literature to support the development of effective and implementable ML systems. ACM Trans. Comp. Hum. Interact (TOCHI) 2020, 27, 1–53. [Google Scholar] [CrossRef]
Wang, Y.; Lopez, J.M.S.; Bolge, S.C.; Zhu, V.J.; Stang, P.E. Depression among people with type 2 diabetes mellitus, US National Health and Nutrition Examination Survey (NHANES), 2005–2012. BMC Psychiatry 2016, 16, 88. [Google Scholar] [CrossRef] [PubMed]
Burns, L.; Teesson, M. Alcohol use disorders comorbid with anxiety, depression, and drug use disorders. Findings from the Australian National Survey of Mental Health and Well Being. Drug Alcohol. Depend. 2002, 68, 299–307. [Google Scholar] [CrossRef] [PubMed]
Bromet, E.; Andrade, L.H.; Hwang, I.; Sampson, N.A.; Alonso, J.; de Girolamo, G.; de Graaf, R.; Demyttenaere, K.; Hu, C.; Iwata, N.; et al. Cross-national epidemiology of DSM-IV major depressive episode. BMC Med. 2011, 9, 90. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Ferrari, A.J.; Charlson, F.J.; Norman, R.E.; Patten, S.B.; Freedman, G.; Murray, C.J.; Vos, T.; Whiteford, H.A. Burden of depressive disorders by country, sex, age, and year: Findings from the global burden of disease study 2010. PLoS Med. 2013, 10, e1001547. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Vincent, P.M.D.R.; Mahendran, N.; Nebhen, J.; Deepa, N.; Srinivasan, K.; Hu, Y.-C. Performance Assessment of Certain Machine Learning Models for Predicting the Major Depressive Disorder among IT Professionals during Pandemic times. Comput. Intell. Neurosci. 2021, 2021, 9950332. [Google Scholar] [CrossRef]
Dipnall, J.F.; Pasco, J.A.; Berk, M.; Williams, L.J.; Dodd, S.; Jacka, F.N.; Meyer, D. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression. PLoS ONE 2016, 11, e0148195. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Hassan, A.U.; Hussain, J.; Hussain, M.; Sadiq, M.; Lee, S. Sentiment analysis of social networking sites (SNS) data using machine learning approach for the measurement of depression. In Proceedings of the 2017 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 18–20 October 2017; pp. 138–140. [Google Scholar] [CrossRef]
Yang, H.; Bath, P.A. Automatic Prediction of Depression in Older Age. In Proceedings of the third International Conference on Medical and Health Informatics 2019, ICMHI 2019, Xiamen, China, 17–19 May 2019; pp. 36–44. [Google Scholar] [CrossRef]
Tanha, J.; Abdi, Y.; Samadi, N.; Razzaghi, N.; Asadpour, M. Boosting methods for multi-class imbalanced data classification: An experimental review. J. Big Data 2020, 7, 70. [Google Scholar] [CrossRef]
Hsieh, W.-H.; Shih, D.-H.; Shih, P.-Y.; Lin, S.-B. An Ensemble Classifier with Case-Based Reasoning System for Identifying Internet Addiction. Int. J. Environ. Res. Public Health 2019, 16, 1233. [Google Scholar] [CrossRef]
Kumar, U.K.; Nikhil, M.B.S.; Sumangali, K. Prediction of breast cancer using voting classifier technique. In Proceedings of the 2017 IEEE International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), Chennai, India, 2–4 August 2017; pp. 108–114. [Google Scholar] [CrossRef]
Centers for Disease Prevention and Control National Health and Nutrition Examination Survey Overview. National Center for Health Statistics. 2017. Available online: https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Questionnaire&CycleBeginYear=2015 (accessed on 15 March 2023).
Byeon, H. Exploring Factors for Predicting Anxiety Disorders of the Elderly Living Alone in South Korea Using Interpretable Machine Learning: A Population-Based Study. Int. J. Environ. Res. Public Health 2021, 18, 7625. [Google Scholar] [CrossRef] [PubMed]
Yusta, S.C. Different metaheuristic strategies to solve the feature selection problem. Pattern Recognit. Lett. 2009, 30, 525–534. [Google Scholar] [CrossRef]
Zian, S.; Kareem, S.A.; Varathan, K.D. An Empirical Evaluation of Stacked Ensembles with Different Meta-Learners in Imbalanced Classification. IEEE Access 2021, 9, 87434–87452. [Google Scholar] [CrossRef]
Pasco, J.A.; Williams, L.J.; Jacka, F.N.; Ng, F.; Henry, M.J.; Nicholson, G.C.; Kotowicz, M.A.; Berk, M. Tobacco Smoking as a Risk Factor for Major Depressive Disorder: Population-based Study. Br. J. Psychiatry 2008, 193, 322–326. [Google Scholar] [CrossRef]
Riemann, D.; Krone, L.B.; Wulff, K.; Nissen, C. Sleep, insomnia, and depression. Neuropsychopharmacology 2020, 45, 74–89. [Google Scholar] [CrossRef]
Yan, C.; Shum, D.; Deng, C.P. Responses to academic stress mediate the association between sleep difficulties and depressive/anxiety symptoms in Chinese adolescents. J. Affect. Disord. 2020, 263, 89–98. [Google Scholar] [CrossRef]
Cahuas, A.; He, Z.; Zhang, Z.; Chen, W. Relationship of physical activity and sleep with depression in college students. J. Am. Coll. Health 2020, 68, 557–564. [Google Scholar] [CrossRef] [PubMed]
Francis, H.M.; Stevenson, R.J.; Chambers, J.R.; Gupta, D.; Newey, B.; Lim, C.K. A brief diet intervention can reduce symptoms of depression in young adults—A randomised controlled trial. PLoS ONE 2019, 14, e0222768. [Google Scholar] [CrossRef] [PubMed]
Lai, J.S.; Hiles, S.; Bisquera, A.; Hure, A.J.; McEvoy, M.; Attia, J. A systematic review and meta-analysis of dietary patterns and depression in community-dwelling adults. Am. J. Clin. Nutr. 2014, 99, 181–197. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Architecture diagram for depression prediction.

Figure 2. DESGM on a 5-fold cross-validation of all feature subsets.

Figure 3. Performance analysis of each subset and overall feature set using different ensemble methods.

Figure 4. DESGM: performance analysis on each feature subset.

Figure 5. (a,b): Comparison of all classifiers (ML and ensemble) in the overall dataset through accuracy and AUC.

Figure 6. Confusion matrix of the DESGM model on the test dataset.

Figure 7. Risk factors for depression.

Table 1. Feature selection method used in related studies.

Dataset	No. of Features	Feature Selection
NHANES (1999–2014) [14]	157	Reduce multi-collinearity, Remove self-reported depression questions
KLoSA [23], UFMG [24]	50, 55	Genetic algorithm
NHANES (2005–16) [25]	20	Features taken from factors based on related work and the PHQ-9 depression questionnaire
NHANES (2005–2012) [26]	51	Features selected based on Type 2 diabetes mellitus and those factors correlated with depression
Thasala Hospital [22]	12	Max-Relevance Min-Redundancy (MRMR)

Table 2. Proportion of depressive and non-depressed participants.

Class Label	No. of Participants
Depressed	415
Not depressed	4719

Table 3. Feature selection based on a different algorithm.

Feature Selection	No. of Features	DESGM Accuracy
IFEA	110	0.9251
GA		0.8547
MRMR		0.8214
IFEA	95	0.9136
GA		0.8765
MRMR		0.8355
IFEA	80	0.9495
GA		0.9125
MRMR		0.8674
IFEA	70	0.9554
GA		0.9210
MRMR		0.8956
IFEA	62	0.9687
GA		0.9286
MRMR		0.9015
IFEA	55	0.9512
GA		0.9124
MRMR		0.8967

Table 4. Features and subset functionality.

IFEA Algorithm
Feature Subset	No of Features
Socio-Life Style	9
General	10
Pain	13
Physical Health Status	7
Physical Activity	10
Food Behaviour	4
Depression Screener	9
Overall	62

Table 5. Comparing classification effectiveness with previously published studies.

Published Research Paper	Year	Techniques Used	Accuracy
Jihoon Oh et al. [13]	2019	Deep learning algorithm	92.00%
Amita Sharma and Willem J [15]	2020	Extreme gradient boosting	90.00%
Xiaohui Tao et al. [9]	2021	Ensemble model	95.40%
Masood Habib et al. [21]	2022	Posterior probability multi-class Support vector machine	91.16%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Selvaraj, A.; Mohandoss, L. Enhancing Depression Detection: A Stacked Ensemble Model with Feature Selection and RF Feature Importance Analysis Using NHANES Data. Appl. Sci. 2024, 14, 7366. https://doi.org/10.3390/app14167366

AMA Style

Selvaraj A, Mohandoss L. Enhancing Depression Detection: A Stacked Ensemble Model with Feature Selection and RF Feature Importance Analysis Using NHANES Data. Applied Sciences. 2024; 14(16):7366. https://doi.org/10.3390/app14167366

Chicago/Turabian Style

Selvaraj, Annapoorani, and Lakshmi Mohandoss. 2024. "Enhancing Depression Detection: A Stacked Ensemble Model with Feature Selection and RF Feature Importance Analysis Using NHANES Data" Applied Sciences 14, no. 16: 7366. https://doi.org/10.3390/app14167366

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Depression Detection: A Stacked Ensemble Model with Feature Selection and RF Feature Importance Analysis Using NHANES Data

Abstract

1. Introduction

2. Related Work

2.1. Risk Factors

2.2. Ensemble Learning Techniques

3. Proposed Method—Depression Ensemble Stacking Generalization Model (DESGM)

3.1. Data Source Representation

3.2. Data Pre-Processing

3.3. Feature Selection: Iterative with Floating Elimination Algorithm (IFEA)

3.4. Stacked Ensemble Model—DESGM

Base Learners

3.5. Artificial Neural Network (ANN)

3.6. K-Nearest Neighbor (KNN)

3.7. Linear Support Vector Machine (LSVM)

3.8. Linear Discriminant Analysis (LDA)

Meta-Learner—Logistic Regression (LR)

3.9. Depression Ensemble Stacking Generalization Model (DESGM)

4. Results and Discussions

4.1. Performance Metrics

4.2. Results of Feature Selection Method

4.3. Results of Ensemble Model

4.4. Results of DESGM

4.5. Role of Feature Importance—Random Forest (RF)

4.6. Discussion

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI