Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Probability Analysis of Hypertension-Related Symptoms Based on XGBoost and Clustering Algorithm

Appl. Sci. 2019, 9(6), 1215; https://doi.org/10.3390/app9061215

by Wenbing Chang^1,†, Yinglai Liu¹, Yiyong Xiao^1,†

, Xingxing Xu¹, Shenghan Zhou^1,*

, Xuefeng Lu² and Yang Cheng³

Reviewer 1: Anonymous

Reviewer 2:

Monika Frenger

Appl. Sci. 2019, 9(6), 1215; https://doi.org/10.3390/app9061215

Submission received: 12 January 2019 / Revised: 17 February 2019 / Accepted: 19 March 2019 / Published: 22 March 2019

(This article belongs to the Special Issue Data Analytics in Smart Healthcare)

Round 1

Reviewer 1 Report

The paper proposes a methodology consisting of obtaining patients data, pre-process the information, applying clustering and finally using the XGBoost method to analyse the probability of having specific symptoms which would allow carrying out an appropriate treatment.

The paper deals with an interesting problem that would allow patients to have a particular treatment according to their specific medical problems. However, there is not very clear the work carried out in this study and the achievements obtained. It would be necessary to make clear different points in this study:

- Introduction section is very extensive, and it is very difficult to understand. I would recommend authors to reduce this section. In this section, they should make clear the advantages and novelty of this study and a summary of the structure and content the paper. Then, specific information about clustering and the disease dealt could be included in a specific section of related works or studies.

- In section 2.1-Clustering methodology, it is explained the clustering method in depth. Really, if the method is not a proposal of this paper, a brief description together with an appropriate reference would be enough. With respect to the references, concretely, it is detailed Alex Rodriguez, Allen and Alessandro Laio, according with your specification, it is difficult to find it in the bibliography. For example, the reference found do not contain Allen. In general, the references in your paper should be reviewed in depth. With respect to the specification included on the method, in line 162, it is commented xi and yi, however, according to your specification, would it be xi and xj?. In the same line, gamma_i would represent the distance, but it is not clear what distance, would it be the distance between the cluster centre and the point?.

o The Figure (a), it is not clear the representation if each element. It would be convenient to make clear the representation of each element in each subfigure include in figure 1.

- Similarly, in section 2.2-XGBoost Method if XGBoost method is not proposed by authors in this paper, a clear reference should be detailed. Also, if these methods are implemented or they are available in a library, it is important to reference it.

- In section 2.3-Probability Analysis Process, it is commented that patients are grouped by means of clustering in categories. The number of categories is fixed or the algorithm could discover a specific number. Normally, when the number of categories is known in the training examples and it is fixed, classification methods produce better results than clustering mehtods, Have authors considered that option?.

- In section 2.3, it is specified that any data are selected as training and other as test. Normally, in this type of problems, it is very usual to carry out cross validation methods for assessing how the results of a statistical analysis will generalize to an independent data set. These methods perform multiple rounds of cross-validation using different partitions, and the validation results are combined (e.g. averaged) over the rounds to give an estimate of the model’s predictive performance.

- In section 3-Experiment ad results discussion, it is not clear the methodology followed for the discussion of results shown.

o Figure 3 that shows a clustering diagram is confusing. Is the centre of each cluster shown according to the density and distance values? If both cluster centres and patients are represented according to the density and distance, nearly all elements would be nearer of one of the cluster, wouldn’t they?.

o What represent core and halo in table 5?.

o The study of the table 6 considers the real patients in each one of the two categories. This grouped is carried out by the clustering method or by the real type of hypertension of each patient. Moreover, the percentages of carry out by your methodology or they are carried out directly by hand with the real information gathered of each patient.

o In your methodology it is not clear the procedure. After preprocess step, the clustering method would group the patients and then, what it is the functionality of XGBoost method? If this method is a decision tree, there is a restriction by respect to the variable to use, as this is solved by authors in this problem. These methods are very interpretable, maybe, it would be interesting to show a tree obtained.

o It seems that the results of applying your methodology are specified from line 384 to 397. Maybe, a summary table with the most relevant results of clustering method and decision tree method would be convenient. Moreover, results of Table 7 are given by your methodology for a new patient, however it is not clear as they are obtained, they are obtained of general for all patients. It is shown results for the two categories, but previously, would your methodology determine the category?. Does a decision tree method return a probability?.

o I would recommend authors to make clear the specification and steps of the methodology, the results obtained using cross validation method.

o According to the problem to solver, it is necessary to a clustering method to determine the type of hypertension, however, Is this information normally available in the patients?, maybe, it would be more interesting that the clustering method determine categories according to the possible symptoms that patients could have, the clustering method can work with categories that are not known and they would be able to group patients directly according to this factor, if this is the finality of the work.

Finally, authors should review the document thoroughly. There are a lot of punctuation, spelling and grammatical mistakes:

- For example, after of a lot of words, there are a number (such as, line 29, 40, 46, 58, 67, 69, 71, and so on).

- The references are specified in different formats, using initial of name and last name, full name and last name, with et al., or with others, …. Moreover, it would be convenient to specified the number that it is used for identifying each reference. I would recommend to review the format and use the same.

- There is point as punctuation and then, it is followed with minuscule (for example, line 84, 335, 343, 364, …).

- Several sentences must have the subject. For example page 7, line 238, 249, …

- Efficient and efficient method, line 146.

- Review English.

Author Response

Dear Reviewer

Thank you very much for your suggestion. Based on your suggestion, I have modified the paper and added what you suggested. I have already marked the modified part with a yellow background. With your advice, the structure and content of this article has improved.

Thank you very much for your suggestion

Author Response File: Author Response.pdf

Reviewer 2 Report

The submitted paper deals with a relevant topic and denies new ways by using innovative methods within this explorative approach. The idea of cluster analysis and the subsequent served boosting method open up a thematic field in which a large number of influencing factors act, but at the same time there is also the possibility of collecting large amounts of data in the future

Nevertheless, I would like to make some comments and hope they improve the paper:

- In the presentation of results, the representations of the XGBoost algorithm using the symptom probabilities are well represented so far. In particular, the non-linear influences are a very significant aspect of the gradient boosting process and form much of the charm of the method. For this reason, I would find it useful and very helpful if at least for the 2 most influential variables a partial plot are displayed, possibly even for all variables in order to represent the shape of the context at least graphics. This too would give the paper more weight.

- Tables 1 (A) and 1 (B) do not contribute much to the paper in my view. I think it's important, as the authors do, to normalize, but I do not consider the printing of the data to be goal-leading. Instead of these tables, I would suggest that the independent variables may also be presented in more detail with a table, so as to be able to respond to the possible connections between these variables. In my view, this would give the paper more value than the data.

- Smaller comments on the form: there are numbers at the end of the sentence in the paper (eg line 29, 40, 67, 69, 71, 95, 101, 105, 127, 130, 133, 137, 141, 145, 152, 155) , which are probably indications of literacy. These are, however, without clasping, and therefore do not fit into the style of citation

- Tables: for some tables explanations would help, so that the tables are understandable even without the text. e.g. Table 3 & 5

Author Response

Dear Reviewer

Thank you very much for your suggestion. Based on your suggestion, I have modified the paper and added what you suggested. As you see, the structure and content of this article have improved.

Thank you very much for your suggestion.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Authors have answered and carried out appropriately the comments indicated in the previous review. I consider that the current manuscript has been improved considerably and it can be accepted.

Article Menu

Probability Analysis of Hypertension-Related Symptoms Based on XGBoost and Clustering Algorithm

Further Information

Guidelines

MDPI Initiatives

Follow MDPI