Next Article in Journal
Analysis of Damage and Permeability Evolution of Sandstone under Compression Deformation
Previous Article in Journal
Characterization of Shallow Sedimentary Layers in the Oran Region Using Ambient Vibration Data
 
 
Article
Peer-Review Record

Enhancing Depression Detection: A Stacked Ensemble Model with Feature Selection and RF Feature Importance Analysis Using NHANES Data

Appl. Sci. 2024, 14(16), 7366; https://doi.org/10.3390/app14167366 (registering DOI)
by Annapoorani Selvaraj * and Lakshmi Mohandoss
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2024, 14(16), 7366; https://doi.org/10.3390/app14167366 (registering DOI)
Submission received: 3 July 2024 / Revised: 14 August 2024 / Accepted: 15 August 2024 / Published: 21 August 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Overall, the manuscript addresses an important topic by analysing survey data to classify participants with depression and non-depression. I would like authors to update the manuscript in view of the following suggestions.

1. The first line of abstract explains that "...5% of adults suffer from depression...", no mention of this claim in the body of the manuscript. I will suggest to include this in the main body (most likely in introduction) and a suitable reference must be cited to support this claim. 

2. The second line in the abstract explains "A complex relationship of cultural, psychological, and physical factors causes clinical depression.". Would you please explain what do you mean by clinical depression? Should this not a depression only in this statement?

3. The terms SF-36 and DESGM are used in an abstract. Would you please give their full form on their first appearance. 

4. In the introduction, the first line "According to the World Health Organization, depression will emerge as a leading contributor to the disability by 2030" must be supported by citing a suitable reference. 

5. On page 2, a statement "This challenge is caused by the fact that depression is a relatively uncommon condition in the general population, which results in datasets that are often imbalance" seems confusing. Should this not be true for all other disease/condition specfic datasets in general?

 6. On page 3 under section "related work", a statement "Machine learning algorithms learn, obtain, recognize, and map underlying patterns to identify de-pressed groupings without limits." seems confusing. Either explain each term such as learn, obtain, recognize and map or rephrase it to make it easy to understand for the reader.

7. On page 3, second paragraph, would you explain how do you differentiate "over and over-under". 

8. On page 5, give the full form of LR on its first mention. 

9. On page 5, it seems reference 38 uses paranethsis () instead of brackets [] - update this

10. On page 5, when describing dataset, would you please give a full form of the NHANES term.

11. In section "3.1. Data Source Representation", the last statement "This program has been." seems incomplete. 

12. Though dataset source is mentioned in "Availability of Data and Materials" section at the end of the manuscript but It would be great if source of the NHANES dataset is cited as a reference in the section "3.1. Data Source Representation" .

13. On page 6, the term "file SFo" is mentioned and it would be great if you explain this. Is it just the file name or something else?

14. Explain the exectation-maximization technique in a sentence. 

15. On page 6 - "After eliminating several unrelated primary inquiries, a total of 110 features were generated" - should the last word "generated" not be replaced with "selected"

16. Figure 1, should be updated with high resolution picture. 

17. On page 11, the first line "machine learning models were developed on the training data" should use the word "trained" instead of "developed".  

18. On page 14, the table 4 explains feature type - would you mention in the text what those "general" features are? 

19. Be consistent in presenting references - especially where DOI is given. For example reference 6 has "doi: 10.1016/B978-0-12-818438-7.00002-2." whereas reference 7 does not have "doi:" in the beginning rather it has "https://doi.org/10.3390/electronics11071111."

Comments on the Quality of English Language

Generally the language require improvement. At some places it is difficult to follow some of the statements (e.g., 1. "Normalization was additionally required to standardize the scale condition across different questions" and 2. "This study assessed the feature dependence of the models for depression prediction to develop an accurate model depending on a constrained set of available features."

Author Response

  1. The first line of abstract explains that "...5% of adults suffer from depression...", no mention of this claim in the body of the manuscript. I will suggest to include this in the main body (most likely in introduction) and a suitable reference must be cited to support this claim. 

Response:

Thank you for your valuable feedback. We will include the claim that "...5% of adults suffer from depression..." in the introduction section of the manuscript and provide a suitable reference to support this statement. We appreciate your attention to detail and will make the necessary revisions promptly.

  1. The second line in the abstract explains "A complex relationship of cultural, psychological, and physical factors causes clinical depression.". Would you please explain what do you mean by clinical depression? Should this not a depression only in this statement?

Response:

Thank you for your insightful feedback. You are correct that clinical depression and major depressive disorder refer to the same condition. To avoid any confusion, we will revise the second line in the abstract to read: "A complex relationship of cultural, psychological, and physical factors causes depression." This change will ensure consistency and clarity in the terminology used. We appreciate your guidance on this matter.

  1. The terms SF-36 and DESGM are used in an abstract. Would you please give their full form on their first appearance. 

Response:

We will ensure that the full forms of SF-36 and DESGM are provided upon their first appearance in the abstract for clarity. This will be included in this revision. Thank you for bringing this to our attention.

  1. In the introduction, the first line "According to the World Health Organization, depression will emerge as a leading contributor to the disability by 2030" must be supported by citing a suitable reference. 

Response:

We will add a suitable reference to support the statement in the introduction that "According to the World Health Organization, depression will emerge as a leading contributor to disability by 2030." We appreciate your guidance in ensuring our manuscript is properly cited and supported by reliable sources.

  1. On page 2, a statement "This challenge is caused by the fact that depression is a relatively uncommon condition in the general population, which results in datasets that are often imbalance" seems confusing. Should this not be true for all other disease/condition specfic datasets in general?

Response:

Thank you for your feedback. You are correct that the issue of imbalanced datasets can apply to many diseases and conditions, not just depression. We will revise the statement on page 2 to clarify this point.

 

  1. On page 3 under section "related work", a statement "Machine learning algorithms learn, obtain, recognize, and map underlying patterns to identify de-pressed groupings without limits." seems confusing. Either explain each term such as learn, obtain, recognize and map or rephrase it to make it easy to understand for the reader.

Response:

We will rephrase the statement on page 3 under the "Related Work" section to improve clarity. The revised statement will read: "Machine learning algorithms identify depressed groupings by learning and recognizing underlying patterns in the data without predefined limits." We appreciate your guidance in making our manuscript more comprehensible for the readers.

 

  1. On page 3, second paragraph, would you explain how do you differentiate "over and over-under". 

 

Response:

Thank you for your feedback. We acknowledge the confusion caused by the term "over and over-under" in the sentence on page 3, second paragraph. We will revise the sentence to clarify the differentiation between over-sampling and under-sampling techniques.

  1. On page 5, give the full form of LR on its first mention. 

Response:

Thank you for your feedback. We will ensure that the full form of LR (Logistic Regression) is provided upon its first mention on page 5 for clarity.

 

  1. On page 5, it seems reference 38 uses paranethsis () instead of brackets [] - update this

 

Response:

Thank you for pointing this out. We will update reference 38 on page 5 to use brackets [] instead of parentheses ().

 

  1. On page 5, when describing dataset, would you please give a full form of the NHANES term.

 

Response:

Thank you for your feedback. We will ensure that the full form of NHANES (National Health and Nutrition Examination Survey) is provided when first mentioned on page 5.

  1. In section "3.1. Data Source Representation", the last statement "This program has been." seems incomplete. 

Response:

Thank you for your suggestion. We will add a citation for the source of the NHANES dataset in section "3.1. Data Source Representation" to provide clear and immediate reference to the dataset. We appreciate your guidance in improving our manuscript.

  1. Though dataset source is mentioned in "Availability of Data and Materials" section at the end of the manuscript but It would be great if source of the NHANES dataset is cited as a reference in the section "3.1. Data Source Representation".

Response:

Thank you for your suggestion. We will add a citation for the source of the NHANES dataset in section "3.1. Data Source Representation" to provide clear and immediate reference to the dataset.

 

  1. On page 6, the term "file SFo" is mentioned and it would be great if you explain this. Is it just the file name or something else?

Response:

Thank you for your feedback. We will clarify the term "file SFo" mentioned on page 6. It refers to a file that contains all 40 topic-based files from the 2015–16 cycle using the supplied sequence ID. We will update the manuscript to explain this more clearly.

 

  1. Explain the exectation-maximization technique in a sentence. 

 

Response:

Thank you for your feedback. We will explain the expectation-maximization technique as follows: "The expectation-maximization (EM) algorithm iteratively estimates missing or incomplete data to compute maximum likelihood estimates."

 

  1. On page 6 - "After eliminating several unrelated primary inquiries, a total of 110 features were generated" - should the last word "generated" not be replaced with "selected"

Response:

Thank you for your feedback. You are correct; "selected" is a more appropriate term. We will update the sentence on page 6 to read: "After eliminating several unrelated primary inquiries, a total of 110 features were selected."

 

  1. Figure 1, should be updated with high resolution picture. 

 

Response:

Thank you for your feedback. We will update Figure 1 with a high-resolution image to ensure clarity and quality.

 

  1. On page 11, the first line "machine learning models were developed on the training data" should use the word "trained" instead of "developed".  

 

Response:

Thank you for your feedback. We will update the first line on page 11 to use the word "trained" instead of "developed," so it reads: "machine learning models were trained on the training data."

18. On page 14, the table 4 explains feature type - would you mention in the text what those "general" features are?

 

Response:

Thank you for your feedback. We have revised the text to include an explanation of the "general" features mentioned in paragraph before Table 4.

19. Be consistent in presenting references - especially where DOI is given. For example reference 6 has "doi: 10.1016/B978-0-12-818438-7.00002-2." whereas reference 7 does not have "doi:" in the beginning rather it has https://doi.org/10.3390/electronics11071111.

 

Response:

Thank you for your feedback. We will ensure consistency in presenting references, particularly with the DOI format. We will update all references to include "doi:" at the beginning where applicable.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Comments

1- The abstract can be shortened without losing value.

2- Section 1 should end with a paragraph stating the organization of the remainder of the paper.

3 – Section 3.1 ends with “…..United States. This program has been.”  Please conclude this sentence about the explanation of the program.

4 – The authors use “dataset” and “data set”. Please use a consistent (the same) designation on the entire paper.

5 – In Algorithm 1, please state how one should proceed to set the desired number of features, d. Please give the reader some hints on how to set this input parameter for a given task.

6 – Please, do not use the same symbol with double meaning. On equation (1), ‘m’ denotes the number of base learners. On section 3.4.1, ‘m’ denotes the number of training samples.

7 – After equation (3), we have “Where 𝜂 is the learning rate, 0 < 𝜂 <.”. Please provide the upper bound value for the parameter.

8 – In Section 3.4.1, the training of the perceptron (is it necessary to describe it on the paper?) can be presented with an algorithmic style.

9 – In Section 3.4.1, the training of the LSVM (is it necessary to describe it on the paper?) can be presented with an algorithmic style.

10 – On the main text, please refer to the figures on the paper. I have found no mention to Figure 1, Figure 2, or Figure 6, for instance. Please introduce and explain the figures to the reader.

11 – Algorithm 2, step 2, mentions “seven psychological functions”. In Figure 2, after the “cross validation set” we observe 8 (and not 7) rectangles with data and functions. Please check on this.

12 - On page 13, please number the Recall, Precision, Accuracy, and F1 score equations.

13 – Table 3. The IFEA and the GA methods are mentioned for the first time on the experimental results of Table 3. There is no mention to these methods before. Please briefly explain these methods, before the presentation of their experimental results.

14 – Table 5, on the “Published Research Paper” column. Please add the proper literature reference number to each paper. Please use the “et al.” formulation for papers with at least 3 authors.

15 – In Figure 6, please state clearly the name of the metric being reported.

16 – A suggestion: on the experimental results, the authors may present a confusion matrix, highlighting the “with no instances of depression being wrongly diagnosed.” mentioned at the end of the abstract.

 

Comments on Writing

1 – Abstract

Please revise the sentence

“A novel DESGM model to enhance the classification performance of both the base (linear support vector machine, perceptron, artificial neural network, linear discriminant analysis, and K-nearest neighbor) and meta-learners (logistic regression).”

 

This sentence has no conclusion; it seems that something is missing.

 

2 – The literature reference numbers should be placed inside the sentences. Please check the entire paper for this issue. Here are some examples:

 

Section 1. Page 1.

and community levels. [1]

->

and community levels [1].  

 

Section 1. Page 2.

socio-cultural characteristics. [2]

->

socio-cultural characteristics [2].

 

the causes of these difficulties. [3]

->

the causes of these difficulties [3].

 

3 – Section 1. Page 2.

Please revise the sentence “Recognizing the importance of several disciplines and approaches, including inter[1]active research techniques, establishing and carrying out research on epidemiology, med[1]ical, public health, and socio-cultural predictions for depression.”

 

This sentence has no conclusion; it seems that something is missing.

4 – Section 2. Page 3.

Please revise the sentence “Predictions of depression also using unsupervised machine learning techniques.”

This sentence has no conclusion; it seems that something is missing.

 

5 – Table 1.

KLoSA[23],UFMG[47 ]

->

KLoSA [23], UFMG [47]

 

NHANES(2005-16) [24]

->

NHANES (2005-16) [24]

 

 

6 – Section 2. Page 4.

utilize the feature selection and importance in NHANES data

->

utilize the feature selection and importance in NHANES data.

 

techniques to predict depression here class imbalance is raising

->

techniques to predict depression where class imbalance is raising

 

 

7 – Section 2.1. Page 5.

potential correlation between depression [21]

->

potential correlation between depression [21].

 

8 – End of Section 2.2. Page 5.

…different resampling techniques

->

…different resampling techniques.

 

9 - Page 6.

comprising 5134 individuals

->

comprising 5134 individuals.

 

10 – Page 9.

any element𝑥 belonging to the set

->

any element 𝑥 belonging to the set

 

11 – Page 10

It is a widely employed method

->

LDA is a widely employed method

 

with 𝑋0equal to 1.

->

with 𝑋0 equal to 1.

 

12 – Page 13.

The equation representing each evaluation index is shown below.

->

The equations representing each evaluation index are shown below.

 

13 – Section 4.2

We have “… physical activity, genral and depression screener datasets.” What is “general”? Is it general? Please check.

 

14 – Page 15

ensemble classifier (Table 5) has presented exhibits a substantial performance

->

ensemble classifier (Table 5) exhibits a substantial performance

 

15 – Figure 5. Please correct “Percision”

 

16 – Page 17

The importance ranking (Figure 5) provided

->

The importance ranking (Figure 6) provided

We have “References 41 and 42 have provided”. Please add the real reference (citation) numbers to the papers on the reference list.

We have “According to previous studies (43,44), it has been demonstrated”. Please add the real reference (citation) numbers to the papers on the reference list.

We have “The view described in the prior source (45) is prevalent”. Please add the real reference (citation) numbers to the papers on the reference list.

17 – Page 18. The coronary heart disease (CHD) acronym is defined twice.

 

Comments on the Quality of English Language

The Quality of English Language is very good.

I have found some minor issues to be corrected.

The literature reference numbers should be placed inside the sentences like this [1].

 

 

 

Author Response

  1. The abstract can be shortened without losing value.

 

Response:

Thank you for your feedback. We will shorten the abstract while maintaining its value and clarity.

 

  1. Section 1 should end with a paragraph stating the organization of the remainder of the paper.

 

Response:

Thank you for your feedback. We will revise Section 1 to conclude with a paragraph outlining the organization of the remainder of the paper.

 

  1. Section 3.1 ends with “…..United States. This program has been.”  Please conclude this sentence about the explanation of the program.

 

Response:

Thank you for your feedback. We will revise the ending of Section 3.1 to provide a complete explanation of the program.

 

  1. The authors use “dataset” and “data set”. Please use a consistent (the same) designation on the entire paper.

 

Response:

Thank you for pointing this out. We will ensure that the term "dataset" is used consistently throughout the entire paper.

 

  1. In Algorithm 1, please state how one should proceed to set the desired number of features, d. Please give the reader some hints on how to set this input parameter for a given task.

 

Response:

Thank you for your feedback. We will update Algorithm 1 to include guidance on setting the desired number of features, d. We will provide hints on how to determine this input parameter based on the specific task and dataset characteristics.

 

  1. Please, do not use the same symbol with double meaning. On equation (1), ‘m’ denotes the number of base learners. On section 3.4.1, ‘m’ denotes the number of training samples.

 

Response:

Thank you for your feedback. We will revise the manuscript to avoid using the same symbol with double meaning. Specifically, we will change the symbol for the number of training samples in section 3.4.1 to ensure clarity and consistency.

 

  1. After equation (3), we have “Where ?is the learning rate, 0 < ? <.”. Please provide the upper bound value for the parameter.

 

Response:

Thank you for your feedback. We will update the text after equation (3) to provide the upper bound value for the learning rate parameter, η. The revised text will read: "Where η is the learning rate, 0<η<1.

 

  1. In Section 3.4.1, the training of the perceptron (is it necessary to describe it on the paper?) can be presented with an algorithmic style

 

Response:

Thank you for your feedback. We will revise Section 3.4.1 to present the training of the perceptron in an algorithmic style for clarity and readability. We believe that this description is necessary for a comprehensive understanding of our methodology.

 

  1. In Section 3.4.1, the training of the LSVM (is it necessary to describe it on the paper?) can be presented with an algorithmic style.

 

Response:

Thank you for your feedback. We will present the training of the LSVM in Section 3.4.1 using an algorithmic style for better clarity and readability. To truly understand our process, we think it's important to include this description.

 

  1. On the main text, please refer to the figures on the paper. I have found no mention to Figure 1, Figure 2, or Figure 6, for instance. Please introduce and explain the figures to the reader.

 

Response:

Thank you for your feedback. We will revise the main text to ensure that all figures, including Figure 1, Figure 2, and Figure 6, are appropriately introduced and explained to the reader. This will help enhance the clarity and comprehensibility of our manuscript.

 

  1. Algorithm 2, step 2, mentions “seven psychological functions”. In Figure 2, after the “cross validation set” we observe 8 (and not 7) rectangles with data and functions. Please check on this.

 

Response:

Thank you for your feedback. We will review Algorithm 2 and Figure 2 to ensure consistency. Specifically, we will verify whether there are seven or eight psychological functions and update the text and figures accordingly to ensure they match. We appreciate your attention to this detail.

 

  1. On page 13, please number the Recall, Precision, Accuracy, and F1 score equations.

 

Response:

Thank you for your feedback. We will number the Recall, Precision, Accuracy, and F1 score equations on page 13 for better clarity and reference.

 

  1. Table 3. The IFEA and the GA methods are mentioned for the first time on the experimental results of Table

 

Response:

Thank you for your feedback. We will ensure that the Iterative Floating Elimination Algorithm (IFEA) and Genetic Algorithm (GA) methods are introduced and explained earlier in the manuscript before their mention in the experimental results of Table 3.  This will provide the necessary context for readers to understand the methods being compared. We appreciate your attention to this detail.

 

  1. Table 5, on the “Published Research Paper” column. Please add the proper literature reference number to each paper. Please use the “et al.” formulation for papers with at least 3 authors.

 

Response:

Thank you for your feedback. We will update Table 5 to include the proper literature reference number for each paper in the "Published Research Paper" column. Additionally, we will use the "et al." formulation for papers with at least three authors.

 

  1. In Figure 6, please state clearly the name of the metric being reported.

 

Response:

Thank you for your feedback. We will update Figure 6 to clearly state the name of the metric being reported.

 

  1. A suggestion: on the experimental results, the authors may present a confusion matrix, highlighting the “with no instances of depression being wrongly diagnosed.” mentioned at the end of the abstract.

 

Response:

Thank you for your suggestion. We will include a confusion matrix in the experimental results section to highlight the statement "with no instances of depression being wrongly diagnosed" mentioned at the end of the abstract. This will provide a clear visual representation of our model's performance and support our claims. We appreciate your guidance in improving the comprehensiveness of our results presentation.

 

Comments on Writing

  1. Abstract (Please revise the sentence) “A novel DESGM model to enhance the classification performance of both the base (linear support vector machine, perceptron, artificial neural network, linear discriminant analysis, and K-nearest neighbor) and meta-learners (logistic regression).

Response:

Thank you for your feedback. We will update the sentence and maintaining its value and clarity.

 

  1. The literature reference numbers should be placed inside the sentences. Please check the entire paper for this issue. Here are some examples:

Response:

Thank you for your feedback. We appreciate your careful review of our manuscript. We have thoroughly checked the entire paper and ensured that all literature reference numbers are now placed inside the sentences.

 

  1. Section 1. Page 2.Please revise the sentence “Recognizing the importance of several disciplines and approaches, including inter[1]active research techniques, establishing and carrying out research on epidemiology, med[1]ical, public health, and socio-cultural predictions for depression.”

Response:

Thank you for your feedback. We have revised the sentence for clarity and readability. The revised sentence is as follows: "Recognizing the importance of multiple disciplines and approaches, including interactive research techniques, we aim to establish and conduct research on the epidemiological, medical, public health, and socio-cultural predictors of depression."

  1. Section 2. Page 3. Please revise the sentence “Predictions of depression also using unsupervised machine learning techniques.” This sentence has no conclusion; it seems that something is missing.

Response:

Thank you for your feedback. We have revised the sentence to ensure clarity and completeness. The revised sentence is as follows: "Predictions of depression can also be enhanced using unsupervised machine learning techniques, which help uncover hidden patterns and relationships within the data."

 

  1. Comments 6,7,8,9,10,11,12,13,14,15,16,17

Response:

Thank you for your feedback. We have revised the manuscript as per these (6,7,8,9,10,11,12,13,14,15,16,17) comments to ensure clarity and completeness.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper focuses on depression detection using machine learning techniques. The topic is interesting and worth investigating. However, several drawbacks are present in the paper. The English in the paper requires extensive editing, with many paragraphs that are hard top follow or to understand properly. Additional issues are discussed in the following.

- The Abstract mentions "SF-36" without any additional explanation of the term. The term is likely to no be familiar to many readers. Additional acronyms that are likely to be unfamiliar to at least some readers include ”DESGM”.

- The following sentence in the Abstract is missing a verb, making its meaning unclear ”A novel DESGM model to enhance the classification performance of both the base (linear support vector machine, perceptron, artificial neural network, linear discriminant analysis, and K-nearest neighbor) and meta-learners (logistic regression).”.

- It is not clear why the authors have chosen to write "Iterative" in the Abstract with a capital letter.

- The paper should be written in a more carefully manner. For example, the first sentence in section 3.1 is "This study utilized data collected throughout the period spanning from 2015 to 2016.". The following sentence is almost identical "The study used data collected throughout 2015-2016.". Additionally, the last sentence in section 3.1 is "This program has been.", which seems to be a mistake.

- Section 4.1 refers to Recall, Prediction, Accuracy and F1-Score as indices and indexes. Recall, Prediction, Accuracy and F1-Score are neither indices nor indexes.

Comments on the Quality of English Language

The English in the paper requires extensive editing, with many paragraphs that are hard top follow or to understand properly. 

Author Response

  1. The Abstract mentions "SF-36" without any additional explanation of the term. The term is likely to no be familiar to many readers. Additional acronyms that are likely to be unfamiliar to at least some readers include” DESGM”.

Response:

Thank you for your feedback. We will provide definitions or brief explanations for these terms to ensure clarity for all readers.

 

  1. The following sentence in the Abstract is missing a verb, making its meaning unclear” A novel DESGM model to enhance the classification performance of both the base (linear support vector machine, perceptron, artificial neural network, linear discriminant analysis, and K-nearest neighbor) and meta-learners (logistic regression).

Response:

Thank you for pointing out the issue with the clarity of the sentence in the Abstract. We have revised the sentence to include the missing verb and ensure that the meaning is clear.

  1. It is not clear why the authors have chosen to write "Iterative" in the Abstract with a capital letter.

Response:

Thank you for your observation regarding the capitalization of the term "Iterative" in the Abstract. To address this, we have revised the Abstract to ensure consistency and clarity. The term "iterative" is now presented in lowercase to accurately reflect its role as a descriptive term rather than a proper noun or title.

 

 

  1. The paper should be written in a more carefully manner. For example, the first sentence in section 3.1 is "This study utilized data collected throughout the period spanning from 2015 to 2016.". The following sentence is almost identical "The study used data collected throughout 2015-2016.". Additionally, the last sentence in section 3.1 is "This program has been.", which seems to be a mistake.

Response:

Thank you for your insightful feedback regarding the clarity and precision of our writing. We have reviewed Section 3.1 and made the necessary revisions to improve the manuscript's overall quality.

 

  1. Section 4.1 refers to Recall, Prediction, Accuracy and F1-Score as indices and indexes. Recall, Prediction, Accuracy and F1-Score are neither indices nor indexes.

Response:

Thank you for your valuable feedback. We have reviewed the manuscript and made the necessary corrections regarding the terminology used in Section 4.1. Specifically, we have replaced the terms "indices" and "indexes" with the appropriate term "metrics" when referring to Recall, Prediction, Accuracy, and F1-Score.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

As compared to the previous version, the authors have addressed all my comments and suggestions.

They have also provided an adequate response letter.

On this version, I have only one minor comment:

1 – Figure 5. Please change

Percision

->

Precision

Author Response

  1. Figure 5. Please change

Percision

->

Precision

Response:

Response:

Thank you for your feedback. We have revised the manuscript as per these comments to ensure clarity and completeness.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

I would like to thank the authors for the changes made. However, several issues should still be addressed:

- The dataset that has been used in the paper is highly unbalanced (415 depressed, 4719 undepressed) which is a major issue for any machine learning algorithm. The authors are kindly asked to better explain how they have taken this aspect into account.

- In section 3.2 the paper mentions that "maximum like-likelyhood estimates" have been used. Additional details are needed in order to demonstrate that the approach is scientifically sound and to make the research reproducible.

- Also in section 3.2 the paper states that "In this case, pruned the data set of variables containing those with missing values higher than 40%.". In order to be technically sound, the paper should specify how many records have been removed.

- The authors are kindly asked to check for similar issues in the entire paper.

Comments on the Quality of English Language

The quality of English should be improved in the entire paper. As an example, the meaning of "In this case, pruned the data set of variables containing those with missing values higher than 40%." is unclear. 

Author Response

Thank you for your valuable feedback. We appreciate your comments and suggestions, which have helped us improve the clarity and rigor of our manuscript. We have made the necessary revisions to your comments. Your input has been instrumental in enhancing the technical soundness of our work. Below is our response addressing the all comments:

 

  1. The dataset that has been used in the paper is highly unbalanced (415 depressed, 4719 undepressed) which is a major issue for any machine learning algorithm. The authors are kindly asked to better explain how they have taken this aspect into account.

Response:

To address this concern, we employed the following strategies:

K-Fold Cross Validation: In the manuscript, we utilized k-fold cross-validation to evaluate the performance of our model. This technique helps in mitigating the impact of data imbalance by ensuring that each fold contains a representative distribution of both classes. Specifically, the data is divided into k subsets (folds), and the model is trained and validated k times, each time using a different fold as the validation set and the remaining k-1 folds as the training set. This approach allows the model to be exposed to different subsets of data, including the minority class, in each iteration, promoting a more generalized learning process.

Evaluation Metrics: To provide a comprehensive evaluation of the model’s performance, we included metrics that are sensitive to class imbalance, such as the F1-score, precision, recall, and the area under the ROC curve (AUC-ROC). These metrics offer a balanced perspective on how well the model performs on both classes, beyond the overall accuracy.

Algorithm Selection: We also considered using algorithms that are inherently better suited to handle imbalanced data, such as ensemble methods (e.g., Random Forest, Gradient Boosting) that can better manage class distributions and improve classification performance for the minority class.

In conclusion, by incorporating these strategies, we aimed to ensure that our model is not unduly influenced by the imbalance in the dataset and can accurately identify both depressed and undepressed individuals. We believe that these measures have significantly enhanced the robustness and reliability of our results.

 

 

  1. In section 3.2 the paper mentions that "maximum like-likelihood estimates" have been used. Additional details are needed in order to demonstrate that the approach is scientifically sound and to make the research reproducible.

Response:

We appreciate the reviewer’s insightful comments on the need for additional details regarding the "maximum likelihood estimates" (MLE) mentioned in Section 3.2 of our manuscript. Below, we offer a detailed account of the MLE methodology used in our study:

“When data is imperfect or missing, as it often is in real-world problems, the expectation maximization algorithm is typically employed to generate maximum likelihood estimates. Maximum likelihood estimates (MLE) find statistical model parameters that make observed data most probable. By estimating missing data (Expectation step) and optimizing parameters based on these estimates (Maximization step), the expectation maximization method iteratively improves estimates to the maximum likelihood estimates.”

  1. Also, in section 3.2 the paper states that "In this case, pruned the data set of variables containing those with missing values higher than 40%.". In order to be technically sound, the paper should specify how many records have been removed.

Response:

To enhance the clarity and technical rigor of our manuscript, we provide the following detailed explanation: “. A total of 189 variables were extracted from the NHANES database for 2015-16. After pruning and applying the expectation maximization technique, 110 distinctive features were selected.

  1. The authors are kindly asked to check for similar issues in the entire paper.

Response:

 

We have carefully reviewed the entire paper for similar issues as per your suggestion. We have made the necessary revisions to ensure consistency and accuracy throughout the manuscript.

 

The quality of English should be improved in the entire paper. As an example, the meaning of "In this case, pruned the data set of variables containing those with missing values higher than 40%." is unclear. 

 

We have undertaken a thorough review of the entire manuscript to enhance the clarity and readability of the text. Specifically, we have revised the sentence you highlighted. The revised sentence now reads: In this scenario, the data set was cleaned to exclude variables with missing values greater than 40%. We are confident that these revisions improve the overall quality of English and ensure that the content is clear and understandable.

Author Response File: Author Response.pdf

Round 3

Reviewer 3 Report

Comments and Suggestions for Authors

I would like to thank the authors for the changes made. The paper should still better explain how the fact that the dataset is unbalanced has been handled.

Comments on the Quality of English Language

The language used in the paper should be improved, as also mentioned in the previous reviews.

Author Response

  1. The paper should still better explain how the fact that the dataset is unbalanced has been handled.

Response: Thank you for your constructive feedback on our manuscript. We appreciate your careful review and the suggestion to clarify how we addressed the issue of dataset imbalance. In response, we have added a detailed explanation in the Methods section of the manuscript. Specifically, we describe the techniques we employed to handle the unbalanced dataset, including k- fold cross validation and oversampling technique. [Oversampling is a technique used to balance the dataset by increasing the number of instances in the smaller class. In this study, oversampling was applied exclusively to the training data portion during each cross-validation iteration.]. These additions aim to provide a clearer understanding of our approach to managing the imbalance in the dataset. Please find the revised version of our manuscript attached. We believe these revisions enhance the clarity and comprehensiveness of our work

 

  1. The language used in the paper should be improved, as also mentioned in the previous reviews.

Response: We thank the reviewer for their continued feedback on the language used in the paper. In response to your comments, we have undertaken a comprehensive revision of the manuscript to improve the quality of the English language. We have carefully reviewed and edited the text to enhance clarity, grammar, and overall readability.

We believe these improvements significantly enhance the presentation of our work, and we hope the revised manuscript meets the journal's standards.

Author Response File: Author Response.pdf

Back to TopTop