Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Forecast of Medical Costs in Health Companies Using Models Based on Advanced Analytics

Algorithms 2022, 15(4), 106; https://doi.org/10.3390/a15040106

by Daniel Ricardo Sandoval Serrano^1,2,†

, Juan Carlos Rincón^1,†

, Julián Mejía-Restrepo^1,†

, Edward Rolando Núñez-Valdez^2,*,†

and Vicente García-Díaz^2,†

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Algorithms 2022, 15(4), 106; https://doi.org/10.3390/a15040106

Submission received: 9 February 2022 / Revised: 18 March 2022 / Accepted: 20 March 2022 / Published: 23 March 2022

(This article belongs to the Special Issue Algorithms in Decision Support Systems Vol. 2)

Round 1

Reviewer 1 Report

In this study, the authors proposed an approach to forecast the medical cost and it is an interesting research topic. However, there are some problems listed as follows:

The specific aims and contribution should be systematically pointed out in the abstract section. Moreover, the experimental results should be briefly described.
In the introduction section, the contributions or characteristics of the proposed approaches should be clearly pointed out.
I suggest the authors to reorganize the structure of the methods. Forecasting the medical cost and finding the factors of the cost are two different functions. What are the relations between these two approaches? The authors need to point it out in the introduction section.
The authors have to detail the used algorithm and the characteristics of these algorithms should be moved to the introduction section. Moreover, the detailed algorithm (such as equations) should be systematically described from inputs to outputs.
The statistical information of the used dataset should be detailed. Moreover, the used variables or attributes are not clear for the readers to understand. The authors may provide the meaning of each variable or attributes.
The statistical analysis results should be detailed and discussed. Many analysis variables are not clear. For example, what are “without comorbidity” and “with one comorbidity”? They are not introduced in the previous description. In Table 3, what are variables (1) ~ (10).
In section 2.3, the authors had detailed the characteristics of the LSTM/resampling/cluster and the previous research. These descriptions should be reworded and moved to the introduction section. In this section, the authors need to detail the algorithm of LSTM/resampling/cluster from input to outputs and then the authors can rebuild the proposed system according to this manuscript. The relations between resampling, LSTM, and cluster are very difficult to understand and the authors need to systematically describe the process between these techniques. The problems solved by using these techniques should be enhanced in the introduction section.
In the experimental results, the authors need to detail the conditions of experiments. The parameters of the LSTM/resampling/cluster should be detailed and examined. Can the used parameters obtain the best performance? What are the effects of the different parameters?
For the results of LSTM, what are the evaluation dataset and the training dataset? How to objectively evaluate it? I suggest that the used MSE and RMSE should be detailed in this section. Moreover, can the authors provide a baseline system to compare the proposed approaches? Are there different approaches provided by previous research? Without comparing with other approaches, it is very hard to evaluate the proposed approaches.
For the results of the cluster algorithm, why 10 clusters? The authors need to provide the reasons. The order of LSTM and cluster would confuse the readers. The authors only showed the results of LSTM by using the data in the third cluster. What are the results of other clusters? The authors examined the results of the third cluster with the LSTM, which was trained from all data. If the LSTM is trained by using a specified cluster, can it improve the accuracy of the specified cluster? Therefore, the order of the proposed process should be clearly detailed and the authors need to provide the reasons in the introduction section.
The authors only provided the results, but the discussions for each result are very important for a study. It will greatly improve the quality of the proposed manuscript.

Author Response

We sincerely appreciate your thoughtful and careful revision of our manuscript. We certainly consider your suggestions very useful for the improvement of our work. Below you will find an inline answer to each of the comments you provided.

Author Response File: Author Response.docx

Reviewer 2 Report

I would like to thank the authors for this work. Please consider my feedback below.

(1)

I believe that the study could have a good potential as a case study. However, the presentation of that aspect needs to be emphasized and presented in a better way throughout the manuscript, especially in the abstract, intro, and conclusions.

(2)

From a practical standpoint, the study is Interesting. However, the contributions are not clear yet.

My view is that the study applies well-explored methods in the healthcare context. The authors should clearly describe how such study would advance the literature.

(3)

The rationale of the methodology adopted is not clear. For example, the intuition behind the applying clustering needs further elaboration.

(4)

The paper needs a bit of organization. The related work should be presented in a separate section aside from the methodology. I recommend including less background details on ML preliminaries, as the present work is a quite practical use case. The reader would be more interested in learning about the use case itself, rather than preliminaries easily found in the literature.

(5)

The clustering could be better presented as an exploratory analysis. As such, the clustering could be presented at an earlier point, before the LSTM.

(6)

The introduction lacks referring to related work that applied patient clustering using ML. For example:

https://doi.org/10.3390/ijerph18041919

https://doi.org/10.1145/3014812.3014874

(7)

I suggest presenting Table 3 into a correlation matrix in a figure.

(8)

The statement in line 207 is not clear, please review it:

“To train the LSTM network with our data, we need to convert the data into a 3D format in the form accepted by LSTM.”

(9)

It is not clear at all why K was decided as 15, no explanation nor convincing evidence behind that choice.

(10)

How the description of the 15 clusters were decided as per table 7?

(11)

I am not clear at all how the authors consider the MSE and RMSE as a form of "Explainability". As far as I see, RMSE and MSE didn't provide any explainability or interpretability for the predictions nor the feature influence, for example.

Author Response

Author Response File: Author Response.docx

Reviewer 3 Report

This study presents meaningful insights regarding medical cost forecasting with the implementation of machine learning methods. The integration of those methods in modern applications and workflows could benefit the health sector, resulting in smart management and optimization of resources. However, there are several points that need to be addressed for the overall improvement of the manuscript:

The Introduction needs to be expanded to include:
1. A brief overview of current machine learning methods that find application in the health sector, followed by a literature review of related advanced analytics projects.
2. Line 76: Enriching and clarifying the purpose and contribution of this work.
3. A brief overview of the following sections in the final sentences of this section.
In Materials & Methods:
1. Line 106: Briefly include some common dataset challenges (e.g. missing values) and discuss the impact of data transformations (e.g. performance benefits, faster convergence).
2. Line 150: Expand on the explanation of the seasonal effect for this domain of time series.
3. Lines 156-166: This segment could be part of the literature review and it could be moved in the introduction during the restructuring of the manuscript.
4. Line 203: The usage of normalization and data scaling needs to be properly justified.
5. Line 206: Details on the 3D data format of LSTMs and how it relates to the data of this project need to be added.
6. Details on the train/test split as well as well as the epochs needed for the training of the LSTM need to be added.
7. Line 242: The choice of K-means for clustering for this dataset could be further explained and justified if it is not arbitrary.
In Results:
1. The section could be expanded to include a more detailed interpretation of each figure, making a connection between input and output data through insightful comments and examples in text.
2. Line 315: Additional details regarding this feature could be outlined (e.g. which are the available time period options that can be selected.)
3. The significance of the resulting scale-dependent error metrics needs to be properly explained, given the actual values of the datasets used (describing that the metrics are relatively small based on the units of the data). Additionally, the authors are encouraged to include a scale-independent error metric such as MAPE for forecast evaluation.

Author Response

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

All issues I concerned have been revised carefully. However, on page 10 and line 312, the description of the implementation of the LSTM network is not suitable. It likes the codes, which are dependent on the programming language. The authors should describe it in technical forms and the detailed parameters should be moved to the section of Experimental results.

Author Response

Notes Review 1

We appreciate this suggestion. We explain the important network parameters in section 3.3.2. In addition, we complete the network results in Section 4.1. This is resolved in the Model Implementation from lines 263 and 352 below, respectively.

Author Response File: Author Response.docx

Reviewer 2 Report

I would like to thank the authors for their response. The manuscript has been certainly improved, and the feedback is much appreciated. However, there are still some points to consider, please.

(1)

The abstract should mention more details about the case study under consideration related to healthcare in Colombia. The industrial aspect of the study should be made clearly in the abstract by referring to the Keralty organization. This would make the article more interesting to the reader as a use case from industry.

(2)

The contributions mentioned in the introductions are too broad. To be more convincing, this part needs to be more specific and aligned with the case study under consideration (i.e., healthcare in Colombia).

(3)

The related work should be extended to include examples of contributions that applied clustering for healthcare problems. For example:

https://doi.org/10.3390/ijerph18041919

https://doi.org/10.1145/3014812.3014874

(4)

The choice of the numbers of clusters remains a major issue to resolve. It appears clearly that 15 is not the elbow point on the elbow figure. As such, I am afraid that I am not convinced about the rationale behind that choice.

The elbow method mayn’t work well in all cases. So, you can consider other methods for evaluating the coherence of clusters such as the Silhouette score, for example. Furthermore, the choice of K could be predicated on some domain knowledge.

Author Response

Notes Review 2

I would like to thank the authors for their response. The manuscript has been certainly improved, and the feedback is much appreciated. However, there are still some points to consider, please.

(1)

We completely agree with this suggestion. Reference is made to a health case in the company Keralty. This is resolved in the abstract from line 14 below.

(2)

We completely agree with this suggestion. We revised and adjusted the introduction, adding work on clustering techniques used in the health sector in Colombia. This is resolved in the introduction from line 106 below.

(3)

The related work should be extended to include examples of contributions that applied clustering for healthcare problems. For example:

https://doi.org/10.3390/ijerph18041919

https://doi.org/10.1145/3014812.3014874

Thank you very much for this suggestion and the references, very appropriate and related to this work. This is resolved in related work from line 139 below.

(4)

We agree with your suggestion. The optimal number is shown by the elbow method and, according to your suggestion, is validated with the Silhouette method, confirming the optimal number of clusters. This is resolved in the model implementation from line 328 below.

Author Response File: Author Response.docx

Reviewer 3 Report

The authors have responded successfully to the majority of the comments/recommendations.

Author Response

Notes Review 3

Thank you very much for your feedback. We revised and adjusted the introduction, adding work on clustering techniques used in the health sector in Colombia. This is resolved in the introduction from line 106 below. We complement the research design for the use of clusters. This is resolved in the model implementation from line 328 below.

Author Response File: Author Response.docx

Round 3

Reviewer 2 Report

Thanks very much for the response, I really appreciate the authors' perseverance. I have no major concerns at the moment.

For the final version, please you would definitely need to improve the quality of figures, most figures are blurry and hard to read.

Also, it would be great to have another review over the text in order to improve the quality of language.

Article Menu

Forecast of Medical Costs in Health Companies Using Models Based on Advanced Analytics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI