Next Article in Journal
Using Machine Learning in Business Process Re-Engineering
Previous Article in Journal
NERWS: Towards Improving Information Retrieval of Digital Library Management System Using Named Entity Recognition and Word Sense
 
 
Article
Peer-Review Record

Fine-Grained Algorithm for Improving KNN Computational Performance on Clinical Trials Text Classification

Big Data Cogn. Comput. 2021, 5(4), 60; https://doi.org/10.3390/bdcc5040060
by Jasmir Jasmir 1,2,*, Siti Nurmaini 3 and Bambang Tutuko 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Big Data Cogn. Comput. 2021, 5(4), 60; https://doi.org/10.3390/bdcc5040060
Submission received: 28 August 2021 / Revised: 25 October 2021 / Accepted: 25 October 2021 / Published: 28 October 2021

Round 1

Reviewer 1 Report

The manuscript written by the authors is scientifically sound and well written. However, there are some suggestions for the betterment of the manuscript, which is listed below:

  1. There are some typographic and minor English language mistakes. Would you please correct them before submitting the final version?
  2. The introduction section has been written mostly with an emphasis on the Literature review. However, it would be great to add a dedicated paragraph explaining the motivation to undertake this research. 
  3. Instead of providing the link inside the text, it would be nice to keep the data source "https://clinicaltrials.gov" in the footnote. It is great that the authors have provided a snippet of how the data looks. However, they have not explained the "class" column in it. Therefore, it is recommended to add a few lines to describe the "class" in the data to understand the manuscript better. 
  4. The snippet of the FGA algorithm inside the text can be removed. The authors can supply the information as supplementary information towards the end if they want. A piece of code without much explanation is only making the text difficult to read. 
  5. It is commendable that the authors took the time to write about different classification algorithms. However, it would be great if they defined the classification at the beginning. The description of each algorithm could be improved without making it longer. Also, some supporting diagrams will be appreciated. E.g., if they can add a small diagram to explain KNN, or DT, Random Forest, etc. 
  6. The result and Discussion section have a few evaluation metrics. It would be great if the authors could add a section to define them. Which metric do they think is more relevant to this problem?
  7. It would be great if the authors could highlight the conclusions drawn from these results and tables. 

Author Response

From Reviewer-1

Comments and Suggestions for Authors Manuscripts written by authors are scientifically sound and well written. However, there are some suggestions for improving the script, which are listed below:

1.There are some typographical errors and minor English. Can you fix it before submitting the final version?

Response: Thank you, I will fix it as soon as possible

  1. The introduction is mostly written with an emphasis on Literature Review. However, it would be great to add a special paragraph explaining the motivation for doing this research.

Response : Ok thanks, I understand it, and I will add an explanation about the motivation in this research

  1. Instead of providing in-text links, it would be nice to keep the data source "https://clinicaltrials.gov" in the footer. It's great that the author has provided a snippet of the data display. However, they haven't explained the "class" column in it. Therefore, it is recommended to add a few lines to describe the "class" in the data in order to understand the script better.

Response : Thanks, I will explain about class in data in labeling section

  1. Snippets of the FGA algorithm in the text can be deleted. Authors can provide additional information towards the end if they wish. A piece of code without much explanation just makes the text hard to read.

Response: Thank you, I'll fix it soon

5.It is very commendable that the author took the time to write about the different classification algorithms. However, it would be nice if they defined the classification at the beginning. The description of each algorithm can be improved without making it any longer. Also, some supporting diagrams would be appreciated. For example, if they could add a small diagram to explain KNN, or DT, Random Forest, etc.

Response : Thanks, I understand it and I will fix it

  1. The Results and Discussion section has several evaluation metrics. It would be great if the author could add a section to define it. Which metrics do they think are more relevant to this issue? It would be great if the author could highlight the conclusions drawn from these results and tables.

Response : Thank you, I will fix it as soon as possible

 

Reviewer 2 Report

This paper proposed an improvement of the computational performance of one of the supervised learning methods, namely KNN in building a clinical trial document text classification model by combining KNN and the Fine Grained Algorithm.

Some comments:

1) The validation and usability of the whole prediction system is missing. This system should be validated by the domain clinical-expert, evaluating the validation through usability.

2) It is not clear why other unsupervised approaches (Random Forest, Decision-Tree or SVM), unlike KNN + FGA, do not improve the computaional preformance and also does not speed up the model evaluation. Are there some structural reasons dependent by the classification models or the quality of clinical trial text-data? Or depending by the FGA algorithm? On the combination of both? In conclusion, why the KNN + FGA increase computational time but others unsupervised model failed?

3) The authors have not included a Related Work section to compare and contrast the presented work with respect to other works proposed in literature in the same field. The comparison between the results obtained and those proposed in the literature is missing.

4) The technical soundness, depth analysis and technical description are very poor.

5) The validation of the correct classification provided by a domain expert is missing. How the text classification system could aid the domain expert to improving patients quality of life?

As for the reproducibility of the work:
(1) Although, experiments have been performed on a real-world dataset, there is no indication that the data is publicly available. I think it would be important to evaluate the presented approaches on publicly available images datasets as well. I think the Authors should make their codes publicly available in order to allow for the reproduction of the experiments and reuse the approach in other domains.

Author Response

From Reviewer-2

Comments and Suggestions for Authors This paper proposes improving the computational performance of one of the supervised learning methods, namely KNN in building a clinical trial document text classification model by combining KNN and Fine Grained Algorithm. Some comments:

1) Validation and usability of the entire prediction system is absent. The system should be validated by a domain clinical expert, evaluating validation through usability.

Response: I apologized, I focus on improving computational performance, I can't validate the data by clinical experts, because the data has been validated by previous researchers, so I use this data only for testing data

2) It is not clear why other unsupervised approaches (Random Forest, Decision-Tree or SVM), unlike KNN + FGA, did not improve computational performance nor did they speed up model evaluation. Are there some structural reasons that depend on the classification model or the quality of the clinical trial text data? Or depends by FGA algorithm? On a combination of the two? In conclusion, why does KNN + FGA increase computation time but other unsupervised models fail?

Response: thank you for the input, I will explain in the manuscript the reasons for other supervised learning in the process, but I will remove the Random forest section and the decision tree section in my manuscript, because it was not done by previous researchers

3) The authors have not included a Related Works section to compare and contrast the presented work with respect to other proposed works in the literature in the same field. There is no comparison between the results obtained and those proposed in the literature.

Response: I will explain in the manuscript

4) Technical health, in-depth analysis and technical description is very poor.

Response: thanks, I'll fix it

5) Validation of the correct classification provided by the domain expert does not exist. How can a text classification system help domain experts to improve patients' quality of life?

Response: thank you, this is indeed part of the shortcomings of this study, because it focuses on the computer science field, we do not focus on the medical field.

As for the reproducibility of the work: (1) Although, experiments have been carried out on real world data sets, there is no indication that the data are publicly available. I think it's important to evaluate the approach presented on publicly available image datasets as well. I think Authors should make their code publicly available to allow reproduction of experiments and reuse of approaches in other domains.

Response : Thank

Round 2

Reviewer 2 Report

The corrections made by the authors are in line with the requirements proposed during the reviewing phase.

Author Response

Thanks.

Back to TopTop