Next Article in Journal
Modulation of Metamorphic and Regenerative Events by Cold Atmospheric Pressure Plasma Exposure in Tadpoles, Xenopus laevis
Next Article in Special Issue
PISIoT: A Machine Learning and IoT-Based Smart Health Platform for Overweight and Obesity Control
Previous Article in Journal
High-Pressure-Induced Transition from Ferromagnetic Semiconductor to Spin Gapless Semiconductor in Quaternary Heusler Alloy VFeScZ (Z = Sb, As, P)
Previous Article in Special Issue
Wearable Accelerometer and sEMG-Based Upper Limb BSN for Tele-Rehabilitation
 
 
Article
Peer-Review Record

Evaluating Information-Retrieval Models and Machine-Learning Classifiers for Measuring the Social Perception towards Infectious Diseases

Appl. Sci. 2019, 9(14), 2858; https://doi.org/10.3390/app9142858
by Oscar Apolinardo-Arzube 1,†, José Antonio García-Díaz 2,*,†, José Medina-Moreira 1, Harry Luna-Aveiga 1 and Rafael Valencia-García 2
Reviewer 1:
Reviewer 2: Anonymous
Appl. Sci. 2019, 9(14), 2858; https://doi.org/10.3390/app9142858
Submission received: 28 June 2019 / Revised: 12 July 2019 / Accepted: 13 July 2019 / Published: 18 July 2019
(This article belongs to the Special Issue Intelligent Health Services Based on Biomedical Smart Sensors)

Round 1

Reviewer 1 Report

In this study, authors take a machine learning based approach in the emerging field of infodemiology to tackle the challenge of surveillance in public health for early detection of  infectious diseases epidemics. 

They have used Twitter data for this purpose. Overall the manuscript is written very sound and provides clear and easy to read descriptions of concepts such as Infodemiology and Opinion Mining which are essential for telling the work later on. Also, the reason for using Twitter is well justified.  So I appreciated the efforts of the authors on this regard.

Although i find the article generally easy to read, innovative and well written, i have five major comments, followed up by a few minor points/suggestions for the authors:

In the sections 3.2 and 3.3 there are various figures and tables that shows the behaviour of RF and SMO models with respect to using different corpus and different char-grams and word-grams. Most of these figures the text is merely describing what the graph is already showing. I would expect more context and explanatory information for the reader to relate to and interpret the findings than merely describing the graphs. 

The graphs showing for various word/char - gram for different classifiers are certainly not surprising as from any machine learning tasks that there is learning in the certain phase and plateau in the second phase and decrease with overfitted models. The important finding is common findings and where two Machine learning modalities actually differ and why?

Next is about validation. The manuscript describes a methodology but not clearly validates  it. Since they are going for binary classification where a random classifier has %50 accuracy, some instances %67 and a maximum of ~%90 is certainly a good starting point but not very impressive yet. But this is partially due to the complexity of the phenomena the authors chosen.  For this I would suggest a much more direct approach and pick a topic that they can validate. For example weather: apply the methodology and categorize the weather attributes. Since the weather data for any regions is available, they can first predict the weather via tweets and check with the actual weather reports across certain time periods where & when the tweets are generated their accuracy. This way we have a good idea that this is a sound methodology that we can also trust for infectious diseases.

The following point is I understand the difficulty in coming up with a good corpus size and content for machine learning purposes but i believe the article should provide more details about practical applicability of such an approach in practice. After reading the article I have more questions in my mind than answered in the text

Is the corpus size or word/char - gram is dependent on the language? If yes can we still learn from one study to actually transfer what we learn to another language ?

How could such a first line information system be used to inform public health authorities? For example why the authors might consider to make an online tool that can analyze the Twitter activity from the last week and generate graphs about the upcoming weeks/months. Later they can follow this up with the actual disease prevelance data for example such data usually exists for flu seasonally.

The fifth comment is can the authors give guidelines for creating corpus that could be applied in other domains in a clear to implement point-by-point way so the manuscript would be of interest for a larger public. 

Minor comments:

Line 161 ..."that distinctly "separates"the classes in a data-set"...

I would mention Twitter in the abstract/ title since actually this is the data source and an important part of the study.

Again the abstract i would give the number of tweets analyzed, used for classifief etc. to give an idea about the content of the article.


Author Response

Please see the attachment 

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors present a very interesting paper. They start with a nice and clear title.


I consider it should be accepted in the present form although my minor comments could be included in the final proofreading. I hope they can help to improve the paper.


- Page 2: "To conduct infodemiolgy studies it is necessary to find data sources...". This is a very interesting paragraph. I miss some updated references, though, besides [3].


- page 4, table 1: Sometimes there are classifications with neutral but here it is not the case? why?

Nice balanced corpus with 3,090 tweets pos and 3,090 tweets neg, but, out of interest, how did you manage to have it done?


- Page 5. well described, it is true RF and SOM are good classifiers. However, I think it could be better argued the decision besides "they were applied in the past to perform NLP [24]". Also because this is not a recent reference.


There is a good experimentation section.


- Future work. Although it is an interesting point, I miss a longer statement of future lines, especially in this type of project. Perhaps data integration, as well as ontologies, could take part in it and also more included in the manuscript as keywords in this context.


Author Response

Please see the attachment 

Author Response File: Author Response.pdf

Back to TopTop