Predicting Depression during the COVID-19 Pandemic Using Interpretable TabNet: A Case Study in South Korea

Nguyen, Hung Viet; Byeon, Haewon

doi:10.3390/math11143145

Open AccessArticle

Predicting Depression during the COVID-19 Pandemic Using Interpretable TabNet: A Case Study in South Korea

by

Hung Viet Nguyen

and

Haewon Byeon

^*

Department of Digital Anti-Aging Healthcare (BK21), Inje University, Gimhae 50834, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(14), 3145; https://doi.org/10.3390/math11143145

Submission received: 25 June 2023 / Revised: 14 July 2023 / Accepted: 15 July 2023 / Published: 17 July 2023

(This article belongs to the Special Issue Machine Learning and Statistical Modeling with Applications in Real-World Data and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

COVID-19 has further aggravated problems by compelling people to stay indoors and limit social interactions, leading to a worsening of the depression situation. This study aimed to construct a TabNet model combined with SHapley Additive exPlanations (SHAP) to predict depression in South Korean society during the COVID-19 pandemic. We used a tabular dataset extracted from the Seoul Welfare Survey with a total of 3027 samples. The TabNet model was trained on this dataset, and its performance was compared to that of several other machine learning models, including Random Forest, eXtreme Gradient Boosting, Light Gradient Boosting, and CatBoost. According to the results, the TabNet model achieved an Area under the receiver operating characteristic curve value (AUC) of 0.9957 on the training set and an AUC of 0.9937 on the test set. Additionally, the study investigated the TabNet model’s local interpretability using SHapley Additive exPlanations (SHAP) to provide post hoc global and local explanations for the proposed model. By combining the TabNet model with SHAP, our proposed model might offer a valuable tool for professionals in social fields, and psychologists without expert knowledge in the field of data analysis can easily comprehend the decision-making process of this AI model.

Keywords:

deep learning; depression; explainable AI; machine learning; SHapley Additive exPlanations; tabular data

MSC:

68T01; 68T09; 68T07

1. Introduction

Depression has emerged as a significant global issue that is continuing to plague the global health agenda, affecting populations all around the world. The World Health Organization (WHO) estimates that over 350 million individuals, accounting for approximately 4.4% of the world’s population, suffer from depression [1]. It is predicted that by 2030, depression will become the leading health concern worldwide [1]. The outbreak of COVID-19 has further aggravated the problem, because people were compelled to stay indoors and limit their social interactions, leading to a worsening of the depression situation [2]. The prevalence of depression in the overall population during the pandemic has been estimated to have been 33% [1]. In addition to impacting physical health, COVID-19 also causes several mental diseases [3]. However, early detection and appropriate treatment have shown promising results in mitigating their impact. Consequently, the urgent need for the development of effective methods and instruments to monitor mental health is indisputable.

Advanced computing approaches, such as machine learning and deep learning, have emerged as promising ways of improving individual mental health outcomes [4]. In particular, machine learning algorithms have shown their effectiveness at determining relationships between depression and co-occurring disorders, as well as correctly forecasting the occurrence of depression [5]. In another study, logistic regression effectively predicted treatment-resistant depression in the context of the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) cohort trial [6]. Similarly, approaches based on machine learning have been used to predict treatment outcomes for depression in cross-trial [7]. In addition, machine learning techniques have been used to identify depression-related biomarkers in datasets such as the National Health and Nutrition Examination Survey (NHANES) [8] and to predict the persistence and severity of depression based on self-report data [9]. Deep learning algorithms have also significantly aided in the identification of depression. Using EEG signals and visual indicators, such as facial appearance and images, deep learning algorithms have been utilized to detect depression [10,11,12]. Furthermore, patients with depression have been identified with success using multimodal techniques combining video, audio, and text streams [13]. These studies indicate that machine learning and deep learning could play crucial roles in future mental health care.

While deep learning models excel at learning complex data structures, such as in the case of image, language, or even audio datasets, they tend to generalize less effectively on tabular datasets compared to traditional machine learning models [14]. To overcome this limitation, Arik et al. [15] introduced the TabNet model. TabNet, a novel deep tabular learning approach, has been demonstrated to possess exceptional efficiency and explainability. It employs a distinctive encoder design that establishes a tree-like output manifold and utilizes feature masks to elucidate the decision-making process at each stage [15]. As a result, TabNet exhibits both high accuracy and strong interpretability in tabular learning tasks. This architecture has been widely applied in various domains, including medicine [16], environmental science [17], and finance [18]. However, to the best of our knowledge, there have been no reported applications of TabNet to address depression prediction problems.

The lack of interpretability poses a significant challenge when machine learning and deep learning models are entrusted with making critical decisions that impact individuals’ well-being [19]. To address this issue, explainable artificial intelligence (XAI) has emerged, aiming to enhance the understandability of complex “black-box” models for human users. One approach within XAI is the model-agnostic technique, which focuses on providing post hoc explanations for “black-box” models [20]. A commonly used model-agnostic technique is SHapley Additive exPlanations (SHAP) [21], which leverages concepts from game theory and employs Shapley values to compute feature importance scores for each prediction made by a “black-box” model. SHAP has been applied in various research studies [22,23,24] related to machine learning models for depression prediction.

In this study, we aim to construct an explainable deep learning model suitable for tabular data, specifically a TabNet model combined with SHAP, to predict depression in South Korea during the COVID-19 pandemic. This research article features contributions in several key respects:

First, we employ a tabular dataset extracted from the Seoul Welfare Survey, consisting of 3027 samples, to train and evaluate the TabNet model. By utilizing real-world data from South Korea, our study enhances the applicability and relevance of the predictive model within the specific regional context.
Secondly, we compare the performance of the TabNet model with other commonly used machine learning models, including Random Forest (RF) [25], eXtreme Gradient Boosting (XGBoost) [26], Light Gradient Boosting (LightGBM) [27], and CatBoost [28]. This comparative analysis provides insights into the strengths and weaknesses of the TabNet model for predicting depression.
Thirdly, we investigated the local interpretability of the TabNet model by employing SHAP, enabling post hoc global and local explanations for the proposed model. This makes it possible to evaluated the interpretability and understanding of the model’s predictions.

2. Materials and Methods

2.1. Used Materials

The Seoul Welfare Survey was conducted by the Seoul Institute to gather data on the welfare status and needs of the residents of Seoul. The purpose of the survey was to represent the characteristics of Seoul and provide a basis for the design and monitoring of citywide welfare policies. The survey received approval from the IRB in Statistics Korea (No-202017399). The Seoul Welfare Survey was conducted to establish reliable welfare statistics for Seoul on a representative sample of Seoul residents. At the time of the survey, the sample size for the survey consisted of 3000 households in Seoul. The sampling approach employed a multi-stage stratified proportional distribution method using the sampling strata from the 2018 Population and Housing Census of Statistics Korea. This required the selection of 300 sample tracts using the multi-stage stratified proportional distribution method and 10 households per tract using stratified sampling. The sample size of 3000 households was chosen to control the margin of error to within 1.8% at a 95% confidence level. The survey was conducted using a structured questionnaire covering demographic characteristics, economic activity, income and expenditure, health status, social activities, and housing conditions. The survey was conducted through face-to-face household interviews using a structured questionnaire. Computer-assisted personal interviewing (CAPI) was used as the survey method, but paper questionnaires were also used to address potential quarantine issues such as lockdowns during the COVID-19 pandemic. In the 2020 survey, approximately 68.7% of the households who completed the survey were interviewed using CAPI, while 31.3% were interviewed using paper questionnaires. In total, 3027 completed surveys were included in the study. The demographic make-up of the people who participated in the survey is depicted in Table 1.

2.2. Method

In order to develop a high-performance explainable prediction model for depression, several steps were carried out in this study, as shown in Figure 1. The dataset was first preprocessed to ensure its suitability for analysis. Next, an important feature selection process was conducted to identify the most relevant features for depression prediction. Subsequently, the TabNet model was trained and evaluated, along with other machine learning models such as Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting (LGBM), and CatBoost. Finally, to obtain insights into the TabNet model’s predictions and enhance its interpretability, the SHAP method was utilized. Detailed explanations of each method and model used in the study are provided in the following subsections.

2.2.1. Data Preprocessing

The data were subjected to preprocessing steps to handle missing values, recategorize the target variable, and address data imbalance. Since the Seoul Welfare Survey included optional questions, the dataset contained missing values that required processing. Although the survey did not have a specific question directly identifying depression, it included the CESD-R-10 [29] questions, which are a widely used self-report measure for assessing depression. Based on the responses to the CESD-R-10, the target variable “depression” was constructed. After determining the target variable, the final preprocessing phase consisted of addressing the problem of imbalance within the dataset.

Step 1.: Handling missing values:

Initially, the dataset consisted of 3027 samples and 659 features. As the dataset comprised survey responses, certain questions were optional and were not answered by all participants, resulting in a higher proportion of missing values in specific columns. These columns often had missing values exceeding the 50% threshold. However, since these columns represented non-essential or supplementary information rather than variables that were crucial for our analysis, their exclusion was justified. By removing columns with a high proportion of missing values, we aimed to focus our analysis on the core variables that had been answered by a significant number of participants. This approach ensured the reliability and meaningfulness of the remaining data while still capturing the key aspects necessary for our research objectives. To address the issue of missing values, columns consisting more than 50% of null values were removed from the dataset. As a result, the dataset was reduced to 3027 samples and 228 variables after the removal of missing values.

Step 2.: Recategorizing the target feature:

The CESD-R-10 (Center for Epidemiologic Studies Depression Scale Revised 10-item version) is a self-report scale used to assess depressive symptomatology in the general population [29]. It consists of 10 items that ask respondents to rate the number of days they felt specific emotions or performed certain behaviors in the last week on a 4-point Likert scale. The response options range from 0 (rarely or none of the time) to 3 (all of the time). The scores for the 10 items are summed to evaluate depressive symptomatology, with a score of 10 or higher indicating clinical depression. The CESD-R-10 has demonstrated strong internal consistency (Cronbach’s α = 0.86) and high test–retest reliability (ICC = 0.85) [30].

In addition to the CESD-R-10, the Seoul Welfare Survey included a similar question about a week-long experience of not wanting to eat and having no appetite, which utilized the same 4-point Likert scale. We calculated the total score by summing the responses to the 11 items, including the 10 CESD-R-10 items and the additional question. In this study, the target feature was generated based on the total score, with individuals scoring 10 or higher being considered to be depressed (labeled as 1), while those scoring below 10 were categorized as being non-depressed (labeled as 0). After creating the target feature, the 11 items’ columns were removed from the dataset. As a result, the dataset consisted of 3027 samples, 216 variables, and 1 target feature column named “depression”.

Step 3.: Solving imbalanced problem:

The target feature exhibited an imbalance in the distribution of non-depressed (label 0) and depressed (label 1) samples, with a ratio of 2589:438. To address this class imbalance, we employed the SMOTE-ENN technique [31], which combines over-sampling and under-sampling methods. SMOTE (Synthetic Minority Over-sampling Technique) [32] is an oversampling technique that aims to increase the representation of minority classes (depressed labels) by creating synthetic instances. It analyzes the instances of the minority class, selects the k-nearest neighbors, and generates synthetic instances in the feature space on the basis of the selected neighbors. On the other hand, ENN (Edited Nearest Neighbor) [33] is an under-sampling technique that eliminates misclassified observations from both classes. It identifies the k-nearest neighbors of each observation and compares the majority class of the observation’s neighbors with the class of the observation itself. If the majority class of the neighbors differs from the observation’s class, both the observation and its neighbors are removed from the dataset. By combining SMOTE and ENN, we rebalanced the dataset by increasing the representation of the minority class using synthetic instances and removing misclassified instances from both classes.

Before employing the SMOTE-ENN method, the dataset was divided into a training set and a test set in a ratio of 80:20. There were 2421 samples in the training set and 606 samples in the test set. The SMOTE-ENN approach was only applied to the training set to ensure the model learned from the rebalanced data while preserving the representative nature of the test set. For implementing the SMOTE-ENN technique, we utilized the “SMOTEENN” class from the “imblearn.combine” module version 0.10.1 in Python. To specify the parameters for SMOTEENN, we set “sampling_strategy = ‘all’” to resample all classes. After employing the SMOTE-ENN method, the rebalanced training set consisted of 3354 samples, with a ratio of 1328 non-depressed (label 0) to 2026 depressed (label 1) individuals.

2.2.2. Feature Selection

Following the preprocessing stage, the dataset consisted of 228 variables. Not all of these variables were necessarily significant for classification. The inclusion of a large number of variables leads to increased computational time and resource usage, enhances complexity, and often adds little to no valuable information, potentially even resulting in diminishing performance [34]. Thus, it is crucial to pare down the number of features to achieve optimal results.

In this study, our objective was to train five models, including TabNet, and four conventional machine-learning models for the purposes of comparison, including RF, XGBoost, LGBM, and CatBoost. We aimed to identify significant features that enhance the performance of not only TabNet, but also the comparison models. Figure 2 presents the feature selection strategy employed in this study. All models utilized their default hyperparameters during the feature selection process.

Due to the fact that RF, XGBoost, LGBM, and CatBoost are tree-based models, we opted to use the BorutaShap [35] method for significant feature selection. BorutaShap is an efficient Python-based wrapper method that merges the Boruta feature selection algorithm [36] with SHAP. BorutaShap is only compatible with and supports tree-based learners as the base model [37]. In the process of identifying the most influential features, the Boruta algorithm generates shadow features, exact replicas of the original features, and systematically randomizes their values to eliminate any correlations with the response variable [37]. These original and shadow-shuffled features are then employed in the tree-based model to predict the target variable, leveraging the strength of the respective learner. Consequently, the algorithm calculates the permutation importance or mean decrease accuracy (M) for both the actual features and the shadow-shuffled inputs for all trees (s_tree). The formulation capturing this calculation can be expressed as follows:

M = Σs_tree (Σ_t∈OOB I(y_t = f(x_t)) − Σ_t∈OOB I(y_t = f(x_tⁿ))/(|OOB| × s_tree)

(1)

where x_t represents a group of predictor variables (x_t ∈ Rⁿ), y_t denotes the target variable x_t ∈ R for n inputs in the set T (where t = 1, 2, …, T), the function I acts as an indicator, while OOB refers to the Out-of-Bag predictive error, y_t = f(x_t) signifies the predicted value prior to permutation, and y_t = f(x_tⁿ) corresponds to the predicted value after permutation. The algorithm evaluates the z-score by conducting a two-sided hypothesis test (t-test) to test the equality of both actual and shadowed values [38]. The z-score is computed as follows:

z-score = M/S

(2)

where S represents the standard deviation of accuracy losses. The algorithm sets a threshold where the z-score of the actual feature must exceed the maximum z-score (z_max) of the randomized shadow features. Only when the z-score of the actual feature surpasses this threshold is the feature regarded as significant. Moreover, the algorithm conducts comparative analyses between the features and their corresponding shadow features, evaluating their Shapley importance values (SHAP values), which yields a more consistent outcome [37]. In this study, we employed the BorutaShap package in Python (available at https://github.com/Ekeany/Boruta-Shap, accessed on 1 June 2023) to select important features from each model (RF, XGBoost, LGBM, and CatBoost). The “n_trials” parameter, which determines the number of iterations of the BorutaShap algorithm, was set to 50 for each model to evaluate the importance of features.

However, due to the deep learning architecture of TabNet, the feature selection for this classifier was conducted separately. We utilized TabNet’s built-in ‘feature_importances_’ function to calculate the importance score of all features. After conducting several trials with different importance score thresholds, we decided to select only those features with an importance score equal to or greater than 0.01. This threshold allowed us to focus on the most significant features in TabNet, ensuring that we consider the key variables contributing to the model’s predictions.

Upon acquiring the set of significant features from each model, a voting mechanism was employed to collectively identify the most influential features. A model cast a ‘vote’ for a feature if it was part of the model’s significant feature list. Thus, the votes received by a feature represent its significance across the various models, and the vote count for each feature varied between one (indicating its selection by one model) and five (indicating its selection by all five models).

From this voting mechanism, we derived five feature subsets labeled ‘voting_1’ through ‘voting_5’. The ‘voting_1’ subset comprised every feature that received at least one vote, with each subsequent subset (‘voting_2’ through ‘voting_5’) consisting of features that received at least two, three, four, and five votes, respectively. To assess the impact of these feature subsets on model performance, each of the five models was cross-validated in turn on each subset, from ‘voting_1’ to ‘voting_5’. The aim was to identify the optimal feature subset that met two conditions. Firstly, the performance of each model using the feature subset must be superior to the performance of the area under the receiver operating characteristic curve (AUC) value when using all features. Secondly, there should be a significant difference in the AUC performance of at least one model compared to the other feature subsets. The optimal feature subset satisfying these conditions was then selected and would be used for training and optimizing the models.

2.2.3. TabNet Model

TabNet is an innovative architecture for deep learning designed specifically for tabular data that offers a high level of both performance and interpretability [15]. TabNet’s architecture consists of a succession of steps, in each of which, a D-dimensional feature vector is processed. At each step, the input is passed through a Feature Transformer block. This block comprises multiple layers, which can be either shared across all decision steps or specific to a single step. Within the block, there are fully connected layers, a batch normalization layer, and a Gated Linear Unit (GLU) activation function. In addition, the GLU is connected to a residual normalization connection, which serves to reduce network variance. The multi-layered block facilitates feature selection and improves the network’s parameter efficacy. TabNet’s architecture is described in detail in Figure 3.

The Feature Transformer is linked to both the Attentive Transformer and the Mask, orchestrating a comprehensive feature selection at every stage. The Attentive Transformer is a multi-layered structure, comprising fully connected layers and batch normalization layers. The formulation of the Attentive Transformer and masking process is given by Equation (3):

a[i − 1]:M[i] = sparsemax(P[i − 1] × h_i[a − 1])

(3)

where a[i − 1] denotes the preceding step, P[i − 1] is the prior scale, and h_i represents a certain trainable function.

The Attentive Transformer is characterized by two crucial elements: the sparsemax activation function and the prior. Sparsemax introduces sparsity into the feature vectors, followed by projecting these features onto a probabilistic mapping in the Euclidean space, thus decreasing dimensionality. Post projection, each feature vector is associated with a probability, which aids in enhancing model interpretability. The prior scale term, P[i], signifies the prominence of a feature across previous steps and is given by Equation (4):

P [i] = \prod_{j = 1}^{i} (γ - M [j])

(4)

where γ describes the association between the feature enforcement at one decision step or across multiple steps. If γ is equal to 1, the feature is enforced at the given step; if γ is equal to 0, enforcement occurs across multiple steps.

The Attentive Transformer chooses the most conspicuous features to shape the transformed feature vector and directs these features towards the trainable Mask, M[j]. The Mask fosters interpretability and further refines the feature selection from the Attentive Transformer. When M_bj [i] (which defines the jth feature of the bth sample) is equal to 0, it implies that the feature contributes no value at that step. Gathering these Masks at each stage produces a coefficient that gauges the significance of each step in the final decision-making process.

In this research, pytorch_TabNet version 4.0 was utilized to construct the TabNet model. The TabNet model’s hyperparameters were fine-tuned using the Optuna [39] framework. The hyperparameters of the TabNet model, optimized for this research, are as follows: ‘n_d’: 32, ‘n_a’: 64, ‘gamma’: 1.0529025412526656, ‘momentum’: 0.3777743510901338, ‘mask_type’: ‘entmax’, ‘patienceScheduler’: 8, ‘epochs’: 218.

2.2.4. Machine Learning Models

Random Forest (RF):

The Random Forest [25] (RF) algorithm is a machine learning strategy that employs “bagged” trees. RF relies on an ensemble of decision trees, each of which is constructed using randomly selected samples and predictors from the training dataset. These samples are distributed independently and identically. In contrast to conventional decision trees, RF takes a different approach by considering only a random subset of features during each iteration as opposed to using all available features for dividing. By implementing this selective feature subset strategy, each predictor can contribute to the construction of tree structures, resulting in reduced prediction variance and enhanced model accuracy compared to merely bagging the training samples. The following are the hyperparameters of the RF model, as optimized by Optuna: ‘class_weight’: ‘balanced’, ‘criterion’: ‘entropy’, ‘max_depth’: 9, ‘max_features’: 0.4, ‘min_impurity_decrease’: 1e-09, ‘min_samples_leaf’: 6, ‘min_samples_split’: 10, ‘n_estimators’: 300.

eXtreme Gradient Boosting (XGBoost):

Extreme Gradient Boosting [26] (XGBoost) is a highly effective implementation of the gradient boosting method, designed to discover nonlinear relationships between input variables and outputs within a boosting ensemble framework. It excels at precisely capturing and learning complex and nonlinear relationships [39,40,41]. XGBoost employs boosting techniques to improve the ensemble’s predictive ability by integrating the strengths of multiple weak classifiers. XGBoost is distinguished by its capacity to manage diverse and massive datasets, as well as by its optimization techniques, which reduce memory consumption and computational complexity. It employs a variety of regularization strategies to prevent overfitting and improve generalizability. The following hyperparameters of the XGBoost model were obtained following optimization using Optuna: ‘objective’: ‘binary:logistic’, ‘colsample_bytree’: 0.904628, ‘learning_rate’: 0.112842, ‘max_depth’: 7, ‘min_child_weight’: 1, ‘n_estimators’: 286, ‘reg_alpha’: 2.344177, ‘reg_lambda’: 10.0, ‘scale_pos_weight’: 50.0, ‘subsample’: 0.610825.

Light Gradient Boosting (LightGBM):

LightGBM [27] is a performant and scalable implementation of the gradient boosting algorithm for machine learning. It employs two sophisticated methods, namely gradient-based one-side and exclusive feature bundling. LightGBM employs leaf-wise tree growth as opposed to level-wise tree growth, in contrast to conventional gradient boosting techniques such as XGBoost. In leaf-wise tree growth, LightGBM grows the tree vertically, concentrating on expanding the leaves that contribute the most to reducing the overall loss. This vertical expansion strategy enables LightGBM to prioritize essential features and achieve convergence more rapidly. In contrast, most other algorithms extend their tree structures horizontally by successively adding levels. Figure 4 illustrates the benefit of leaf-wise tree growth. In this study, the LGBM model’s hyperparameters, optimized by Optuna, are as follows: ‘bagging_fraction’: 0.69664, ‘bagging_freq’: 7, ‘feature_fraction’: 0.882759, ‘learning_rate’: 0.194762, ‘min_child_samples’: 100, ‘min_split_gain’: 0.309977, ‘n_estimators’: 86, ‘num_leaves’: 256, ‘reg_alpha’: 1 × 10⁻¹⁰, ‘reg_lambda’: 10.0, ‘subsample_for_bin’: 200,000.

CatBoost:

CatBoost [28], which stands for “categorical boosting”, is an algorithm for machine learning that concentrates on categorical columns within datasets. To effectively manage categorical features, it employs permutation techniques, one_hot_max_size (OHMS), and target-based statistics. CatBoost employs a greedy method at every split in the current tree to resolve the exponential growth of feature combinations caused by categorical variables. CatBoost follows these steps for categorical features with more categories than the OHMS parameter:

Randomly dividing the records into subsets.
Converting the variable’s label to an integer value.
Transforming categorical attributes into numerical values.

The transformation of categorical features to numerical values involves calculating the average target value (avrTarget) for each category based on the following formula:

avrTarget = (count_In_Class + prior)/(total_Count + 1)

(5)

where count_In_Class represents the number of positive instances (ones) in the target for a given categorical feature, total_Count is the number of previous objects, and prior is a value specified by the starting parameters. In this research, the CatBoost model’s hyperparameters, optimized by Optuna, are as follows: ‘depth’: 2, ‘l2_leaf_reg’: 10, ‘random_strength’: 0.167311, ‘n_estimators’: 163, ‘eta’: 0.248041.

2.2.5. Evaluation Metrics

In this study, various evaluation metrics are employed to accurately assess the performance of each algorithm. The selected evaluation metrics include accuracy, precision, Recall, and F1-score. Before delving into these metrics, it is essential to understand the pertinent parameters of the confusion matrix, which include:

True Positive (TP): The number of instances correctly classified as “depressed” (label 1) by the model.
True Negative (TN): The number of instances correctly classified as “non-depressed” (label 0) by the model.
False Positive (FP): The number of instances incorrectly classified as “depressed” by the model.
False Negative (FN): The number of instances incorrectly classified as “non-depressed” by the model.

Accuracy = (TP + TN)/(TP + TN + FP + FN)

(6)

Precision = TP/(TP + FP)

(7)

Recall = TP/(TP + FN)

(8)

F1-score = 2 × (Precision × Recall)/(Precision + Recall)

(9)

Furthermore, the area under the receiver operating characteristic curve (AUC) was utilized as an additional performance metric to evaluate the models. The AUC score is a reliable performance measure, as it is not dependent on a specific cutoff value for classification [42]. A higher AUC indicates better predictive performance of a model. In this study, the model with the highest AUC was considered to have the best prediction ability. In situations where multiple models achieved the same AUC value, the model with the highest F1 score was deemed to be superior.

For the implementation of our research, we used a Jupyter Notebook and employed Python 3.11.3 as the software environment. Specifically, we employed the ‘sklearn’ package to implement the machine learning models (RF, XGBoost, LGBM, CatBoost), and the ‘pytorch’ package to implement the TabNet model. We conducted our experiments on an Apple MacBook Pro 14 equipped with an Apple M1 Max chipset and 32 GB of RAM. To assess the performance of all models, we employed the stratified 10-fold cross-validation technique. In stratified k-fold cross-validation, the dataset is divided into k sections of equal size, and cases are randomly assigned to each section. The stratified cross-validation ensures that each subset maintains a distribution of class labels that is similar to the initial dataset. During the k evaluations of the model, each subset is used once as the validation set, while the remaining subsets are used for training. This approach allows for comprehensive model evaluation, as the model is validated on different subsets of the data, ensuring a more robust estimate of its generalization ability.

2.2.6. SHapley Additive exExplanations (SHAP)

SHapley Additive exExplanations (SHAP) [21] was applied to the best prediction model in this study in order to obtain a better understanding of the underlying reasons behind the predictions. SHAP assigns importance scores to each input feature for a specific prediction using principles from cooperative game theory. In game theory, rules regulate how players interact, and each player has a strategy and receives rewards. Individual participants’ contributions to the game are determined using Shapley values. In the context of model explanation, the strategies correspond to the outcomes of the procedures, the participants to the features, and the reward to the quality of the obtained results. Shapley values aid in identifying the contribution of a specific characteristic to the overall prediction. The sampling procedure can be repeated to enhance the marginal contribution approximations. Ultimately, the SHAP value is defined as the weighted average of the marginal contributions across all possible coalitions of features, and is given as:

φ(f) = ∑_{S⊆F}{i} |S|!(|F| − |S| − 1)! × [f(S ∪ i) − f(S)]/|F|!

where φ(f) represents the weighted average Shapley value contributed by feature i when considering all coalitions that exclude feature i. F denotes the total number of features, S represents the subset or coalition of features, f(S ∪ i) indicates the model prediction when feature i is included, and f(S) represents the model prediction without feature i.

In this paper, we utilized three SHAP visualization techniques, namely the SHAP global bar plot, the SHAP beeswarm plot, and the SHAP force plot, to investigate the relationships between individuals with depression and the features derived from the survey data. The SHAP global bar plot provided a comprehensive view of feature importance by displaying the mean absolute value of each feature’s importance across all samples. This plot allowed us to identify the most influential features in the model’s predictions. For a more detailed analysis of how the top features impact the model’s output, we employed the SHAP beeswarm plot. This plot effectively summarizes the information by representing each explanation instance as a single dot on the feature row. The position of each dot along the x-axis was determined using the corresponding SHAP value, while the density of dots along each feature row provided an indication of the magnitude of the impact. To better understand how each feature contributes to shifting the model’s output from the base value prediction (i.e., the mean prediction across the training set) to the specific model output, we employed the SHAP force plot. In this plot, features that push the prediction higher are depicted in red, while features that push the prediction lower are depicted in blue. This visualization allowed us to obtain insights into the directional impact of each feature on the model’s output. To implement these SHAP visualization techniques, we utilized the SHAP Python package developed by Slundberg (version 0.41.0, available at https://github.com/slundberg/shap, accessed on 1 June 2023).

3. Results and Discussion

3.1. Evaluation of Feature Subsets

When we applied BorutaShap to select the important features for the RF, XGBoost, LGBM, and CatBoost models, 138, 43, 58, and 83 important features were obtained, respectively. For the TabNet model, 28 key characteristics were identified. The details of the important features for each model are shown in Figures S1–S5 (see Supplementary Materials). Subsequently, we utilized the voting mechanism for feature selection and generated five feature subsets, labeled as ‘voting_1’, ‘voting_2’, ‘voting_3’, ‘voting_4’, and ‘voting_5’. Table 2 displays the AUC scores for all of the models, along with their default hyperparameters, evaluating the impact of the feature subsets chosen by each individual model and the voting mechanism.

When using all features, the AUC scores for RF, XGBoost, LGBM, CatBoost, and TabNet were 0.9759, 0.9732, 0.9744, 0.9711, and 0.8927, respectively. Among the individual model’s feature subsets, the RF and LGBM models achieved the highest AUC using CatBoost features, with AUC scores of 0.9769 and 0.9766, respectively. The XGBoost model obtained the highest AUC of 0.9767 using RF features. On the other hand, the CatBoost and TabNet models both achieved the highest AUC using XGBoost features, with scores of 0.9763 and 0.9247, respectively.

Among the five feature subsets obtained using the voting mechanism, with the exception of the voting_5 feature subset, the other feature subsets achieved higher AUC values than when using all features for all models. The differences in AUC values between RF (min: 0.9765, max: 0.9770), XGBoost (min: 0.9760, max: 0.9765), LGBM (min: 0.9760, max: 0.9766), and CatBoost (min: 0.9759, max: 0.9761) models when using the voting_1, voting_2, voting_3, and voting_4 feature subsets were relatively small, ranging from 0.0002 to 0.0006. However, there was a significant disparity in the AUC of the TabNet model when using the voting_4 subset. The TabNet model achieved an AUC of 0.9302 for this subset, which was considerably higher than the AUC of 0.8927 when using all features. Furthermore, the AUC differences among the different feature subsets (voting_1, voting_2, voting_3, voting_4) for the TabNet model (min: 0.9151, max: 0.9302) ranged from 0.0039 to 0.0151, indicating notable variability in performance. Therefore, we selected the features in the voting_4 subset, including 36 features, in order to train and optimize the prediction models in this research. The details of these features are described in Table 3.

Feature selection methods play a crucial role in improving prediction accuracy and computational efficiency by identifying the most relevant features for a specific problem domain [43,44]. In our research, the results of the feature selection procedure had a substantial effect on both the TabNet model and the other machine learning models. In particular, for the TabNet model with default hyperparameters, feature selection enhanced the AUC from 0.8927 (using all features) to 0.9302 (using the optimal subset of selected features). It is essential to note that there is no universally optimal method for feature selection, as the performance of these techniques can vary based on the characteristics of the dataset [44]. Consequently, further research in analyzing this dataset using machine learning or deep learning should place greater emphasis on exploring various feature selection methods.

3.2. Evaluation of Optimized Models

Table 4 presents the predictive performance of five optimized models (TabNet, RF, XGBoost, LGBM, and CatBoost) on the training set and test set using a selection of 36 important features. TabNet obtained the highest AUC score of 0.9957 on the training set, surpassing all other models. The AUC values of the XGBoost, LGBM, CatBoost, and RF models were 0.9947, 0.9947, 0.9946, and 0.9940, respectively. TabNet also outperformed the other models in terms of F1-score, accuracy, and precision. With respect to F1-score, TabNet achieved 0.9752, followed by XGBoost (0.9724), LGBM (0.9723), RF (0.9716), and CatBoost (0.9706). In terms of accuracy, TabNet achieved a score of 0.9702, followed by XGBoost and LGBM (0.9666), RF (0.9657), and CatBoost (0.9645). Similarly, in terms of precision, TabNet achieved a precision of 0.9792, followed by LGBM and RF (0.9729), CatBoost (0.9723), and XGBoost (0.9716). However, TabNet had a slightly lower recall score than XGBoost and LGBM. XGBoost achieved the highest recall, with a value of 0.9734, followed by LGBM (0.9719), TabNet (0.9714), RF (0.9704), and CatBoost (0.9689).

On the test set, TabNet continued to demonstrate superior performance. It achieved an AUC of 0.9937, outperforming RF (0.9874), XGBoost (0.9848), LGBM (0.9835), and CatBoost (0.9832). In terms of F1-score, TabNet achieved a score of 0.9403, followed by RF (0.9287), CatBoost (0.9265), XGBoost (0.9249), and LGBM (0.9220). TabNet also achieved the highest accuracy, with a value of 0.9604, followed by RF (0.9521), CatBoost (0.9505), XGBoost (0.9488), and LGBM (0.9472). Similarly, in terms of precision, TabNet obtained the highest value of 0.9356, followed by RF (0.9130), CatBoost (0.9087), LGBM (0.9000), and XGBoost (0.8967). XGBoost achieved the highest recall with a value of 0.9550, while the other models (TabNet, LGBM, CatBoost, and RF) had a recall of 0.9450. Consistent with the results obtained on the training set, TabNet outperformed the other models in terms of AUC, accuracy, precision, and F1-score on the test set. The confusion matrix in Figure 5 shows that TabNet had a true positive rate of 94% and a true negative rate of 97% on the test set, further validating the effectiveness of the TabNet model on this dataset.

In a recent study by Hosseinzadeh Kasani et al. [45], an XGBoost model was constructed to identify depression using a dataset from the Korean National Health and Nutrition Examination Survey (K-NHANES) [46] with 4,804 samples. The XGBoost model achieved high performance, with accuracy and AUC values of 0.8602 and 0.8534, respectively. However, compared to the TabNet model proposed in our study, our model outperformed the XGBoost model proposed by Hosseinzadeh Kasani et al. [45] in terms of accuracy and AUC. In another study, by Zulfiker et al. [47], six different machine learning classifiers were investigated using socio-demographic and psychosocial information from a revised version of the Burns Depression Checklist (BDC) survey to detect depression. The AdaBoost model demonstrated the highest performance among the models, with an accuracy of 0.9254 and an AUC of 0.96. Our proposed TabNet model also surpassed the performance of the AdaBoost model. Through indirect comparison, our TabNet model demonstrates superior performance in predicting depression outcomes using tabular data extracted from socio-demographic and psychosocial surveys. To strengthen the robustness of our TabNet model’s performance, in future research, experimental validation should be carried out on the basis of a factual platform in Seoul City, South Korea. This would make it possible to assess our proposed model’s generalizability and practical implications in a specific geographic context.

The hyperparameter optimization carried out in this study also produced remarkable outcomes. The performance of the TabNet model was significantly enhanced by optimizing the model’s hyperparameters. With optimized hyperparameters, the AUC of the TabNet model increased from 0.9302 (using the default hyperparameters) to 0.9957. Nonetheless, optimizing the hyperparameters for deep learning models such as TabNet, with complex structures and a large number of hyperparameters, can be difficult. Due to the exhaustive search space involved, it often requires significant computational resources [48,49]. The optimization of hyperparameters is essential for fine-tuning model performance and attaining optimal results [48]. Given the computational challenges associated with hyperparameter optimization for deep learning models, it is even more critical to address this issue in future studies involving larger datasets.

3.3. Interpretation of TabNet with SHAP

For the global interpretation of the TabNet model, Figure 6a,b depicts the importance ranking of depression risk characteristics among the elderly and how the values of characteristics influence the risk change. Figure 6a presents the characteristics that contribute significantly to the depression prediction results. The top five features making the greatest contributions include code62 (“Experience of not being able to heat due to heating costs”), code574 (“Feeling of happiness the day before the survey”), code286 (“Willingness to participate in the future—to vote in elections”), code659 (Head of household”), code635 (“Current emotional state compared to pre-COVID-19”), and code67 (“Recognition of housing welfare-related projects—support for housing purchase funds (loans)”). According to Figure 6b, “Experience of not being able to heat due to heating costs” has an important influence on depression; those not able to heat due to heating costs possess a higher risk of depression. In contrast, greater values of the characteristic “Feeling of happiness the day before the survey” are associated with reduced risk of depression. This finding is intuitive and convincing, as it aligns with the notion that individuals who experience happiness are less likely to be prone to depression.

In addition to the global interpretation, Figure 7 and Figure 8 depict two instances of local explanations derived from the SHAP force plot. Figure 7 and Figure 8 depict the explanation of the TabNet model’s prediction for samples 1 and 2. For the first sample, the model predicted that the individual had no risk of depression. In actuality, the participant did not suffer from depression, confirming the proposed model’s prediction. This person’s characteristics can be described as:

No. of private medical insurance subscriptions per household member (code140): 1
Head of household (code659): High school or less (3)
Feeling of happiness the day before the survey (code574): level 5 (5)
Experience of not being able to heat due to heating costs (code62): No (2)
Current emotional state compared to pre-COVID-19 (code635): 50/100 score
Recognition of housing welfare-related projects—support for housing purchase funds (loans) (code67): Some knowledge of the content (3)
Level of satisfaction with benefits—emergency disaster support fund (code640): Satisfied (4)
Level of household help—emergency disaster relief funds (code642): Slightly helpful (4)
Recognition—Seoul-type basic security system (code297): Some knowledge of the content (3)
Whether or not to use Seoul disaster emergency living expenses (code639): Not accepted (3)
Policy areas that Seoul should focus on the most (1st priority) (code593): Housing policy (7)
Lack of food expenses (code193)—worrying about food: Never (3)

For the second sample, the model predicted that the person was at risk of depression. This aligned with the actual condition of the individual, who had depression, consistent with the model’s prediction results. The characteristics of this individual were:

Whether or not to use Seoul disaster emergency living expenses (code639): Received (2)
The most necessary support for work/family balance (1st priority) (code262): Strengthen maternity and parental leave (1)
Household assistance level—Seoul disaster emergency living expenses (code643): Normal (3)
Level of household help—emergency disaster relief funds (code642): Normal (3)
Level of satisfaction with benefits—emergency disaster support fund (code640): Normal (3)
Willingness to donate in the future (code288): No (1)
Residential welfare-related projects intent to use in the future—support for housing purchase funds (loans) (code94): None (1)
Current emotional state compared to pre-COVID-19 (code635): 20/100 score
Recognition of housing welfare-related projects—support for housing purchase funds (loans) (code67): I’ve heard of it, but I don’t know what it is (2)
Willingness to participate in the future—to vote in elections (code286): No (2)
Feeling of happiness the day before the survey (code574): level 7 (7)
Experience of not being able to heat due to heating costs (code62): No (2)
Current number of children (code225): 3
Expected number of children (code226): 3

By combining the TabNet model with SHAP, this study validates the predictive accuracy of the interpretable TabNet deep learning model for depression outcomes. When applied to tabular data derived from social welfare surveys, the proposed model outperforms conventional machine learning models such as RF, XGBoost, LGBM, and CatBoost, according to the results. In addition, to the best of our knowledge, this study is the first application of the TabNet model to address problems related to depression prediction. Moreover, the TabNet model, in combination with SHAP, offers a valuable tool for professionals in social fields and psychologists who may not possess extensive expertise in data analysis. This facilitates easy comprehension of the decision-making process of the AI model, thereby enhancing its practical applicability in the field.

4. Limitations and Further Research

The limitations of this research are as follows. First of all, the training and optimization of the TabNet model required a significant amount of time compared to the machine learning models. When carrying out training on larger datasets, additional hardware resources may be required. Second, the original dataset contained 659 variables, which is a large dimension. In order to improve the performance of the prediction model, it will be necessary to conduct an in-depth analysis of several feature selection methods in order to select an optimal and insightful subset of features. Thirdly, our dataset was derived from a single region (Seoul, South Korea), potentially leading to limitations in terms of generalizability. A single regional dataset may not be representative of the broader population, leading to biased results. Furthermore, extrapolating findings from a single regional dataset to other regions or populations may be challenging due to unique factors influencing the data. To mitigate these concerns, future research should consider expanding the data sources to include national or international databases, thereby ensuring greater diversity and representativeness in the sample. Fourthly, in this study, the TabNet model was utilized as a “black-box” model and combined with SHAP to provide interpretability. While TabNet itself is an interpretable tabular learning model, our purpose was to leverage TabNet in combination with SHAP to create a tool that professionals in social fields and psychologists without extensive data analysis expertise would be able to easily comprehend. Therefore, an explanation of the TabNet model’s decision-making process based on feature importance masking was not a primary focus of this research. Finally, the explanations provided by SHAP for depression prediction using the TabNet model need to be evaluated by professionals in social fields, psychologists, and psychiatrists. These professionals will be able to assess the relevance and significance of the identified features and their impact on depression prediction. Their expertise will help validate the interpretability of the TabNet model and ensure that the explanations align with existing knowledge and theories in the field of mental health.

5. Conclusions

COVID-19 further aggravated mental health problems by compelling people to stay indoors and limit their social interactions, leading to a worsening of the depression situation. Consequently, there is an urgent need for the development of effective methods and instruments to monitor mental health, both during and beyond the pandemic. This research introduces a highly effective TabNet model combined with SHAP, which offers a valuable tool for use by professionals in social fields and psychologists for the prediction of depression on the basis of social welfare survey data during the COVID-19 pandemic. We used a tabular dataset extracted from the Seoul Welfare Survey, which consisted of a total of 3027 samples. The TabNet model was trained on this dataset, and its performance was compared to that of several other machine learning models, including RF, XGBoost, LGBM, and CatBoost. The results showed that the proposed TabNet model demonstrated strong performance, with an AUC of 0.9957, an accuracy of 0.9702, a precision of 0.9792, a recall of 0.9714, and an F1-score of 0.9752 on the training dataset. On the test dataset, it achieved an AUC of 0.9937, an accuracy of 0.9604, a precision of 0.9356, a recall of 0.9450, and an F1-score of 0.9403. These results showcase the model’s ability to accurately predict depression outcomes. In addition, the model not only achieves excellent performance, but also provides easily comprehensible explanations, making it possible for users to rapidly understand the model’s predictions and interpretations. By combining the TabNet model with SHAP, our proposed model might help professionals in social fields and psychologists to gain deeper insights into the factors driving depression predictions. Such individuals will be able to leverage their specialized knowledge and expertise to verify the reliability of the model, and to effectively apply it in the prediction of depression on the basis of social welfare survey data. This application might potentially support the addressing of depression-related issues, ultimately enhancing society’s quality of life.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math11143145/s1.

Author Contributions

Conceptualization, H.V.N. and H.B.; software, H.V.N.; methodology, H.V.N. and H.B.; validation, H.V.N. and H.B.; investigation, H.V.N. and H.B.; writing—original draft preparation, H.V.N.; formal analysis, H.B.; writing—review and editing, H.B.; visualization, H.V.N.; supervision, H.B.; project administration, H.B.; funding acquisition, H.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research Supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-RS-2023-00237287, NRF-2021S1A5A8062526) and local government-university cooperation-based regional innovation projects (2021RIS-003).

Institutional Review Board Statement

This study was carried out in accordance with the Helsinki Declaration and was approved by the Korea Workers’ Compensation and Welfare Service’s Institutional Review Board (or Ethics Committee) (protocol code 0439001, date of approval 31 January 2018).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are provided at the request of the corresponding author. The data is not publicly available because researchers need to obtain permission from the Korea Centers for Disease Control and Prevention.

Conflicts of Interest

The authors declare no conflict of interest.

References

World Health Organization. Depression and Other Common Mental Disorders: Global Health Estimates; World Health Organization: Geneva, Switzerland, 2017. [Google Scholar]
Latif, S.; Usman, M.; Manzoor, S.; Iqbal, W.; Qadir, J.; Tyson, G.; Castro, I.; Razi, A.; Boulos, M.N.K.; Weller, A.; et al. Leveraging Data Science to Combat COVID-19: A Comprehensive Review. IEEE Trans. Artif. Intell. 2020, 1, 85–103. [Google Scholar] [CrossRef]
Nguyen, H.V.; Byeon, H. Explainable Deep-Learning-Based Depression Modeling of Elderly Community after COVID-19 Pandemic. Mathematics 2022, 10, 4408. [Google Scholar] [CrossRef]
Bzdok, D.; Meyer-Lindenberg, A. Machine Learning for Precision Psychiatry: Opportunities and Challenges. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 2018, 3, 223–230. [Google Scholar] [CrossRef] [Green Version]
Van Loo, H.M.; Cai, T.; Gruber, M.J.; Li, J.; de Jonge, P.; Petukhova, M.; Rose, S.; Sampson, N.A.; Schoevers, R.A.; Wardenaar, K.J.; et al. Major depressive disorder subtypes to predict long-term course. Depress. Anxiety 2014, 31, 765–777. [Google Scholar] [CrossRef] [Green Version]
Perlis, R.H. A Clinical Risk Stratification Tool for Predicting Treatment Resistance in Major Depressive Disorder. Biol. Psychiatry 2013, 74, 7–14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Chekroud, A.M.; Zotti, R.J.; Shehzad, Z.; Gueorguieva, R.; Johnson, M.K.; Trivedi, M.H.; Cannon, T.D.; Krystal, J.H.; Corlett, P.R. Cross-Trial Prediction of Treatment Outcome in Depression: A Machine Learning Approach. Lancet Psychiatry 2016, 3, 243–250. [Google Scholar] [CrossRef] [PubMed]
Dipnall, J.F.; Pasco, J.A.; Berk, M.; Williams, L.J.; Dodd, S.; Jacka, F.N.; Meyer, D. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression. PLoS ONE 2016, 11, e0148195. [Google Scholar] [CrossRef] [Green Version]
Kessler, R.C.; van Loo, H.M.; Wardenaar, K.J.; Bossarte, R.M.; Brenner, L.A.; Cai, T.; Ebert, D.D.; Hwang, I.; Li, J.; de Jonge, P.; et al. Testing a Machine-Learning Algorithm to Predict the Persistence and Severity of Major Depressive Disorder from Baseline Self-Reports. Mol. Psychiatry 2016, 21, 1366–1371. [Google Scholar] [CrossRef] [Green Version]
Acharya, U.R.; Oh, S.L.; Hagiwara, Y.; Tan, J.H.; Adeli, H.; Subha, D.P. Automated EEG-Based Screening of Depression Using Deep Convolutional Neural Network. Comput. Methods Programs Biomed. 2018, 161, 103–113. [Google Scholar] [CrossRef]
Zhou, X.; Jin, K.; Shang, Y.; Guo, G. Visually Interpretable Representation Learning for Depression Recognition from Facial Images. IEEE Trans. Affect. Comput. 2020, 11, 542–552. [Google Scholar] [CrossRef]
Zhu, Y.; Shang, Y.; Shao, Z.; Guo, G. Automated Depression Diagnosis Based on Deep Networks to Encode Facial Appearance and Dynamics. IEEE Trans. Affect. Comput. 2018, 9, 578–584. [Google Scholar] [CrossRef]
Yang, L.; Jiang, D.; Xia, X.; Pei, E.; Oveneke, M.C.; Sahli, H. Multimodal Measurement of Depression Using Deep Learning Models. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View, CA, USA, 23 October 2017. [Google Scholar] [CrossRef]
Shwartz-Ziv, R.; Armon, A. Tabular Data: Deep Learning Is Not All You Need. Inf. Fusion 2022, 81, 84–90. [Google Scholar] [CrossRef]
Arik, S.Ö.; Pfister, T. TabNet: Attentive Interpretable Tabular Learning. Proc. AAAI Conf. Artif. Intell. 2021, 35, 6679–6687. [Google Scholar] [CrossRef]
Nguyen, H.V.; Byeon, H. Prediction of Out-of-Hospital Cardiac Arrest Survival Outcomes Using a Hybrid Agnostic Explanation TabNet Model. Mathematics 2023, 11, 2030. [Google Scholar] [CrossRef]
Son, R.; Stratoulias, D. Sentinel-5P Based Estimation of PM2.5 Concentrations Across Thailand Using Tabnet. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022. [Google Scholar] [CrossRef]
Asencios, R.; Asencios, C.; Ramos, E. Profit Scoring for Credit Unions Using the Multilayer Perceptron, XGBoost and TabNet Algorithms: Evidence from Peru. Expert Syst. Appl. 2023, 213, 119201. [Google Scholar] [CrossRef]
Knapič, S.; Malhi, A.; Saluja, R.; Främling, K. Explainable Artificial Intelligence for Human Decision Support System in the Medical Domain. Mach. Learn. Knowl. Extr. 2021, 3, 740–770. [Google Scholar] [CrossRef]
Abdullah, T.A.A.; Zahid, M.S.M.; Ali, W. A Review of Interpretable ML in Healthcare: Taxonomy, Applications, Challenges, and Future Directions. Symmetry 2021, 13, 2439. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30; MIT Press: Cambridge, MA, USA, 2017; pp. 4765–4774. [Google Scholar]
Fan, R.; Hua, T.; Shen, T.; Jiao, Z.; Yue, Q.; Chen, B.; Xu, Z. Identifying Patients with Major Depressive Disorder Based on Tryptophan Hydroxylase-2 Methylation Using Machine Learning Algorithms. Psychiatry Res. 2021, 306, 114258. [Google Scholar] [CrossRef]
Vetter, J.S.; Schultebraucks, K.; Galatzer-Levy, I.; Boeker, H.; Brühl, A.; Seifritz, E.; Kleim, B. Predicting Non-Response to Multimodal Day Clinic Treatment in Severely Impaired Depressed Patients: A Machine Learning Approach. Sci. Rep. 2022, 12, 5455. [Google Scholar] [CrossRef]
Chun, J.Y.; Sendi, M.S.E.; Sui, J.; Zhi, D.; Calhoun, V.D. Visualizing Functional Network Connectivity Difference between Healthy Control and Major Depressive Disorder Using an Explainable Machine-Learning Method. In Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada, 20–24 July 2020. [Google Scholar] [CrossRef]
Rigatti, S.J. Random Forest. J. Insur. Med. 2017, 47, 31–39. [Google Scholar] [CrossRef] [Green Version]
Chen, T.; Guestrin, C. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3146–3154. [Google Scholar]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 31, 1–11. [Google Scholar]
Radloff, L.S. The CES-D Scale. Appl. Psychol. Meas. 1977, 1, 385–401. [Google Scholar] [CrossRef]
Miller, W.C.; Anton, H.A.; Townson, A.F. Measurement Properties of the CESD Scale among Individuals with Spinal Cord Injury. Spinal Cord. 2007, 46, 287–292. [Google Scholar] [CrossRef]
Batista, G.E.A.P.A.; Prati, R.C.; Monard, M.C. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. ACM SIGKDD Explor. Newsl. 2004, 6, 20–29. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Beckmann, M.; Ebecken, N.F.F.; Pires de Lima, B.S.L. A KNN Undersampling Approach for Data Balancing. J. Intell. Learn. Syst. Appl. 2015, 07, 104–116. [Google Scholar] [CrossRef] [Green Version]
Peiró-Signes, Á.; Segarra-Oña, M.; Trull-Domínguez, Ó.; Sánchez-Planelles, J. Exposing the Ideal Combination of Endogenous–Exogenous Drivers for Companies’ Ecoinnovative Orientation: Results from Machine-Learning Methods. Socio-Econ. Plan. Sci. 2022, 79, 101145. [Google Scholar] [CrossRef]
Keany, E. BorutaShap: A Wrapper Feature Selection Method Which Combines the Boruta Feature Selection Algorithm with Shapley Values; Zenodo: Geneva, Switzerland, 2020. [Google Scholar]
Kursa, M.B.; Jankowski, A.; Rudnicki, W.R. Boruta—A System for Feature Selection. Fundam. Inform. 2010, 101, 271–285. [Google Scholar] [CrossRef]
Prasad, S.S.; Deo, R.C.; Downs, N.; Igoe, D.; Parisi, A.V.; Soar, J. Cloud Affected Solar UV Prediction With Three-Phase Wavelet Hybrid Convolutional Long Short-Term Memory Network Multi-Step Forecast System. IEEE Access 2022, 10, 24704–24720. [Google Scholar] [CrossRef]
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019. [Google Scholar] [CrossRef]
Hu, C.-A.; Chen, C.-M.; Fang, Y.-C.; Liang, S.-J.; Wang, H.-C.; Fang, W.-F.; Sheu, C.-C.; Perng, W.-C.; Yang, K.-Y.; Kao, K.-C.; et al. Using a Machine Learning Approach to Predict Mortality in Critically Ill Influenza Patients: A Cross-Sectional Retrospective Multicentre Study in Taiwan. BMJ Open 2020, 10, e033898. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Liu, J.; Wu, J.; Liu, S.; Li, M.; Hu, K.; Li, K. Predicting Mortality of Patients with Acute Kidney Injury in the ICU Using XGBoost Model. PLoS ONE 2021, 16, e0246306. [Google Scholar] [CrossRef] [PubMed]
Heldt, F.S.; Vizcaychipi, M.P.; Peacock, S.; Cinelli, M.; McLachlan, L.; Andreotti, F.; Jovanović, S.; Dürichen, R.; Lipunova, N.; Fletcher, R.A.; et al. Early Risk Assessment for COVID-19 Patients from Emergency Department Data Using Machine Learning. Sci. Rep. 2021, 11, 4200. [Google Scholar] [CrossRef] [PubMed]
Sokolova, M.; Lapalme, G. A Systematic Analysis of Performance Measures for Classification Tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Azhar, M.A.; Thomas, P.A. Comparative Review of Feature Selection and Classification Modeling. In Proceedings of the 2019 International Conference on Advances in Computing, Communication and Control (ICAC3), Mumbai, India, 20–21 December 2019. [Google Scholar] [CrossRef]
Ali, A.; Gravino, C. Evaluating the Impact of Feature Selection Consistency in Software Prediction. Sci. Comput. Program. 2022, 213, 102715. [Google Scholar] [CrossRef]
Hosseinzadeh Kasani, P.; Lee, J.E.; Park, C.; Yun, C.-H.; Jang, J.-W.; Lee, S.-A. Evaluation of Nutritional Status and Clinical Depression Classification Using an Explainable Machine Learning Method. Front. Nutr. 2023, 10, 1165854. [Google Scholar] [CrossRef]
Kweon, S.; Kim, Y.; Jang, M.-J.; Kim, Y.; Kim, K.; Choi, S.; Chun, C.; Khang, Y.-H.; Oh, K. Data Resource Profile: The Korea National Health and Nutrition Examination Survey (KNHANES). Int. J. Epidemiol. 2014, 43, 69–77. [Google Scholar] [CrossRef] [Green Version]
Zulfiker, M.S.; Kabir, N.; Biswas, A.A.; Nazneen, T.; Uddin, M.S. An In-Depth Analysis of Machine Learning Approaches to Predict Depression. Curr. Res. Behav. Sci. 2021, 2, 100044. [Google Scholar] [CrossRef]
Han, J.-H.; Choi, D.-J.; Park, S.-U.; Hong, S.-K. Hyperparameter Optimization Using a Genetic Algorithm Considering Verification Time in a Convolutional Neural Network. J. Electr. Eng. Technol. 2020, 15, 721–726. [Google Scholar] [CrossRef]
Ali, Y.A.; Awwad, E.M.; Al-Razgan, M.; Maarouf, A. Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity. Processes 2023, 11, 349. [Google Scholar] [CrossRef]

Figure 1. General scheme in this study.

Figure 2. Feature selection strategy.

Figure 3. Architecture of TabNet.

Figure 4. Advantages of leaf-wise tree growth versus level-wise tree growth.

Figure 5. Confusion matrix of the optimized TabNet model on the test set.

Figure 6. Global interpretation: (a) order of feature importance (global bar plot); (b) contribution of features (beeswarm plot).

Figure 7. Local explanation of prediction for sample #1 (force plot).

Figure 8. Local explanation of prediction for sample #2 (force plot).

Table 1. Demographic make-up of survey participants.

		Sample (%)
Sex	Male	80.28
Sex	Female	19.72
Age	≤29	3.17
	30~49	42.22
	50~69	47.90
	≥70	6.71
Level of education	No study (7 years old or older)	0.2
	elementary school	1.1
	middle School	3.11
	high school	38.20
	University (less than 4 years)	14.77
	University (more than 4 years)	41.72
	Graduate School (Master’s)	0.76
	Graduate School (PhD)	0.14

Table 2. AUC of the default models for each feature subset.

Feature Subset	RF	XGBoost	LGBM	CatBoost	TabNet
All features	0.9759	0.9732	0.9744	0.9711	0.8927
RF features	0.9767	0.9767	0.9765	0.9758	0.9040
XGBoost features	0.9766	0.9759	0.9759	0.9763	0.9247
LightGBM features	0.9762	0.9764	0.9764	0.9759	0.9119
CatBoost features	0.9769	0.9764	0.9766	0.9762	0.9169
TabNet features	0.9693	0.9687	0.9671	0.9675	0.8893
voting_1	0.9770	0.9765	0.9766	0.9761	0.9263
voting_2	0.9770	0.9760	0.9761	0.9760	0.9190
voting_3	0.9766	0.9764	0.9762	0.9759	0.9151
voting_4	0.9765	0.9761	0.9760	0.9761	0.9302
voting_5	0.9621	0.9631	0.9625	0.9628	0.9030

Random Forest classification model = RF, Extreme Gradient Boosting classification model = XGBoost, Light Gradient Boosting classification model = LGBM, CatBoost classification model = CatBoost, TabNet classification model = TabNet.

Table 3. Selected features and their description.

Feature	Description	Field Type
code26	Last school_member of household 1	1: preschool (less than 7 years old); 2: no study (7 years old or older); 3: elementary school; 4: middle school; 5: high school; 6: university (less than 4 years); 7: university (more than 4 years); 8: graduate school (masters); 9: graduate School (PhD)
code51	Type of occupancy of the house in which you live	1: self; 2: charters; 3: monthly rent with deposit; 4: monthly rent without deposit; 5: free; 6: miscellaneous
code54	Residential house size (number of rooms)	Numeric
code60	Energy used for heating (1st priority)	1: district heating; 2: oil; 3: electricity; 4: city gas; 5: LPG gas; 6: briquettes; 7: miscellaneous
code62	Experience of not being able to heat due to heating costs	1: yes; 2: no
code67	Recognition of housing welfare-related projects—support for housing purchase funds (loans)	1: unknown; 2: I’ve heard of it; but I don’t know what it is; 3: some knowledge of the content; 4: relatively well known
code76	Whether or not to use housing welfare-related projects—support for housing purchase funds (loans)	1: currently in use; 2: previous experience; 3: none
code84	Degree of assistance for housing welfare-related projects—Jeonse (loan) support	1: not helpful at all; 2: not very helpful; 3: normal; 4: slightly helpful; 5: very helpful
code94	Residential welfare-related projects Intent to use in the future—support for housing purchase funds (loans)	1: none; 2: slightly hopeful; 3: very hopeful
code103	Household member 1 participation in economic activities	1: full-time wage worker; 2: temporary wage workers; 3: daily wage workers; 4: self-sufficiency work; public work; jobs for the elderly; 5: employers with employees; 6: self-employed without employees; 7: unpaid family workers; 8: unemployed; 9: economically inactive population
code140	No. of private medical insurance subscriptions per household member	Numeric
code193	Lack of food expenses—worrying about food	1: often; 2: sometimes; 3: never; 4: Do not know/refuse answer
code195	Experience of food shortage—not eating a well-balanced diet	1: often; 2: sometimes; 3: never; 4: don’t know/refuse answer
code225	Current number of children	Numeric
code226	Expected number of children	Numeric
code262	The most necessary support for work/family balance (1st priority)	1: strengthen maternity and parental leave; 2: childcare and education assistance; 3: providing high-quality childcare facilities; parental part-time jobs 5: flexible work system expansion; 6: family-friendly workplace culture; 7: society-wide awareness; 8: miscellaneous
code265	Household help—working mom support center	1: not helpful at all; 2: not very helpful; 3: normal; 4: slightly helpful; 5: very helpful
code277	Participation experience in the past year—neighborhood association, women’s association, residents’ meeting	1: did not participate at all; 2: don’t participate; 3: Occasionally participated; 4: often participated
code285	Willingness to participate in the future—civic movement groups, social group activities	1: none; 2: not; 3: normal; 4: willing; 5: Actively participating
code286	Willingness to participate in the future—to vote in elections	1: none; 2: not; 3: normal; 4: willing; 5: Actively participating
code288	Willingness to donate in the future	1: no; 2: yes
code297	Recognition—Seoul-type basic security system	1: don’t know; 2: I’ve heard of it; but I don’t know what it is; 3: Some knowledge of the content; 4: relatively detailed
code571	Improvement compared to 2019—culture and leisure	0: N/A; 1: very displeased; 2: displeased; 3: normal; 4: Satisfied; 5: very satisfied
code574	Feeling of happiness the day before the survey	0: not happy at all; 1: level 1; 2: level 2; 3: level 3; 4: level 4; 5: level 5; 6: level 6; 7: level 7; 8: level 8; 9: level 9; 10: I was very happy
code587	Welfare policy direction/measures for realization—Payment of service fee according to ability vs. provided free of charge	1: A is very important; 2: A is important; 3: half and half; 4: B is important; 5: B is very important
code593	Policy areas that Seoul should focus on the most (1st priority)	1: caring for child (0–18 years old); 2: caring for adults (elderly, disabled, etc.); 3: protection and safety policy; 4: health policy; 5: Education policy; 6: Employment policy; 7: Housing policy; 8: Culture and Leisure policy; 9: environmental policy; 10: quality of life and local infrastructure; 11: miscellaneous
code594	Targets that Seoul should prioritize support for (1st priority)	1: toddlers under the age of 5; 2: school age children and adolescents; 3: youth; 4: old people; 5: middle-aged; 6: disabled; 7: low income; 8: working women with children; 9: all women; 10: single-parent families; 11: multicultural families; 12: single-person households; 13: miscellaneous
code604	Whether or not you have experience using welfare facilities for the elderly	1: currently in use; 2: previous experience; 3: none
code607	Degree of household help—infant welfare facility	1: not helpful at all; 2: not very helpful; 3: normal; 4: slightly helpful; 5: very helpful
code635	Current emotional state compared to pre-COVID-19	Numeric: (score range: 0–100)
code637	Recognition—Seoul emergency living expenses	1: don’t know; 2: I’ve heard of it, but I don’t know what it is; 3: some knowledge of the content; 4: relatively detailed
code639	Whether or not to use Seoul disaster emergency living expenses	1: don’t know; 2: received; 3: not accepted; 4: donation
code640	Level of satisfaction with benefits—emergency disaster support fund	0: N/A; 1: very displeased; 2: displeased; 3: normal; 4: satisfied; 5: very satisfied
code642	Level of household help—emergency disaster relief funds	1: not helpful at all; 2: not very helpful; 3: normal; 4: slightly helpful; 5: very helpful
code643	Household assistance level—Seoul disaster emergency living expenses	1: not helpful at all; 2: not very helpful; 3: normal; 4: slightly helpful; 5: very helpful
code659	Head of household	1: primary school graduate; middle school graduate 3: high school or less; 4: undergraduate; 5: graduate

Table 4. Performance comparison of optimized prediction models.

	Model	Accuracy	Precision	Recall	F1-Score	AUC
Training set	TabNet	0.9702	0.9792	0.9714	0.9752	0.9957
	XGBoost	0.9666	0.9716	0.9734	0.9724	0.9947
	LGBM	0.9666	0.9729	0.9719	0.9723	0.9947
	CatBoost	0.9645	0.9723	0.9689	0.9706	0.9946
	RF	0.9657	0.9729	0.9704	0.9716	0.9940
Test set	TabNet	0.9604	0.9356	0.9450	0.9403	0.9937
	XGBoost	0.9488	0.8967	0.9550	0.9249	0.9848
	LGBM	0.9472	0.9000	0.9450	0.9220	0.9835
	CatBoost	0.9505	0.9087	0.9450	0.9265	0.9832
	RF	0.9521	0.9130	0.9450	0.9287	0.9874

Random Forest classification model = RF, Extreme Gradient Boosting classification model = XGBoost, Light Gradient Boosting classification model = LGBM, CatBoost classification model = CatBoost, TabNet classification model = TabNet.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nguyen, H.V.; Byeon, H. Predicting Depression during the COVID-19 Pandemic Using Interpretable TabNet: A Case Study in South Korea. Mathematics 2023, 11, 3145. https://doi.org/10.3390/math11143145

AMA Style

Nguyen HV, Byeon H. Predicting Depression during the COVID-19 Pandemic Using Interpretable TabNet: A Case Study in South Korea. Mathematics. 2023; 11(14):3145. https://doi.org/10.3390/math11143145

Chicago/Turabian Style

Nguyen, Hung Viet, and Haewon Byeon. 2023. "Predicting Depression during the COVID-19 Pandemic Using Interpretable TabNet: A Case Study in South Korea" Mathematics 11, no. 14: 3145. https://doi.org/10.3390/math11143145

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting Depression during the COVID-19 Pandemic Using Interpretable TabNet: A Case Study in South Korea

Abstract

1. Introduction

2. Materials and Methods

2.1. Used Materials

2.2. Method

2.2.1. Data Preprocessing

2.2.2. Feature Selection

2.2.3. TabNet Model

2.2.4. Machine Learning Models

2.2.5. Evaluation Metrics

2.2.6. SHapley Additive exExplanations (SHAP)

3. Results and Discussion

3.1. Evaluation of Feature Subsets

3.2. Evaluation of Optimized Models

3.3. Interpretation of TabNet with SHAP

4. Limitations and Further Research

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI