Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Towards Identifying Author Confidence in Biomedical Articles

Data 2019, 4(1), 18; https://doi.org/10.3390/data4010018

by Mihaela Onofrei Plămadă¹, Diana Trandabăț²

and Daniela Gîfu^1,2,3,*

Reviewer 1:

Jingbo Xia

Reviewer 2:

Ioannis Panagis

Reviewer 3: Anonymous

Data 2019, 4(1), 18; https://doi.org/10.3390/data4010018

Submission received: 6 November 2018 / Revised: 16 January 2019 / Accepted: 17 January 2019 / Published: 21 January 2019

(This article belongs to the Special Issue Curative Power of Medical Data)

Round 1

Reviewer 1 Report

It is an interesting perspective to identify author confidence in biomedical articles. The presentation and the language part of this paper is good. I suggest a Minor Revision after the author could consider the following suggestions:

How to prove that "author confidence" was the very sentiment that author identified after experiment? As we know, sentiment analysis is to evaluate the attitude of a writer., however, there are joy, positive, and other various sentiment in the context. Among various attitudes and stances, why is "author confidence", instead of other attitudes, identified after sentiment analysis? It would be better if author could enunciate it.

The method of sentiment analysis in section 6.1 should be clearly introduced.

The dataset in section 3 should be clearly stated. Is there any label or value for each text in OAI corpus?

The keyword "malaria" selection is good. It will be better if author select other several keywords, and obtained comparison of these corresponding results as shown in Figure 6, 7 and 8. Consistence of the results is vital and more supportive.

Author Response

Thank you for taking the time to review our paper, and to give us feedback on how to improve the article.

Our instrument for detecting author confidence is based on three pipelined modules: a lexical, a syntactic and a semantic one. The figures for the three modules were slightly modified in the paper, to make the required steps more clear. The lexical module tokenizes and lemmatizes the text, extracts frequencies of medical terms and sentence length. The syntactic module identifies part of speeches, than active/passive voice and 1^st/3^rd person. The semantic module analyses the overall sentiment of the text, i.e. its polarity (negative or positive). This information is combined with information regarding the number of views of the article, the number of citations per author and per paper, as well as number of cited papers. A final decision regarding the confidence of the author is issued after running all the three main modules.

For each step, the importance of different modules was discovered empirically, by looking at the data and through repeated tests.

Author Response File: Author Response.pdf

Reviewer 2 Report

This is a fine research idea but, in my opinion, not carried out sufficiently well. I have noticed some methodological omissions in the paper, which I will detail in the following, there are some inconsistencies that affect paper quality and, finally, the language is ok but at some points, needs significant control

Methodological omissions:

The authors want to describe how to infer author confidence from an OAI paper. Seems a nice idea but at least in the paper, I don’ see how it is executed. The authors say it can be inferred from Sentiment Analysis, # words per sentence, and frequency of domain-specific terms.

The inference is based on a score that is computed containing all the three factors, we never see a formula about the score or a graphic, though and this is something that has to change. How did the authors come up with the formula? How did you come up with the ‘weights’ on line 244? Did you try many weights and how did you agree on the final weighting scheme?

Why don’t we have a (sub)session to discuss choice of weights, a potential methodology and how do we come up with the right scores? My direct choice is 0.333 for each factor but is it enough? Is it what we want?

Omissions/Inconsistencies:

- Lines 213-214: the authors point out Fig. 5 for Sentiment identification and author profiling but the figure doesn’t use the same names. I believe that Sentiment identification is sentiment but why don’t we call it so?

- I don’t understand the choice of “malaria”. What does it offer you the thing that articles will be comparable? Could we have chosen another condition, e.g. AIDS?

- In your figures you are using a rectangle in with solid line that contains various other “modules”. The rectangle needs a title, otherwise why are the modules grouped?

- l. 245-247: which good practice guides? Data journal, mainly discusses what to include and which sessions to use, is it fine?

- I believe Section 1 should be called “Introduction”

- As the authors highlight in l. 163 “we noticed that sentences, which are too long tend to be more difficult to follow”.. However, in session 6, they say “This section presents the results obtained for three features (sentiment analysis, average number of words per sentence and frequency of medical terms) in evaluating an author's confidence”. I interpret this as saying that we use sentence length for checking confidence, which is the paper’s goal. Please check more.

- Again on l.163 the authors use “we noticed”. So did you run an experiment on sentence understanding or is it your own observation?

- I am having a large doubt about the proposed architecture on Figure 4. How can a syntactic analyzer work before POS tagging? How do you know what is e.g. a verb if it is not tagged?

- On lines 268-269 you say: “most of the papers have positive (towards 1.5) sentiments”. What I see from the figure is that most papers have score close to 1.

- L. 269 “confidence is directly linked to positive expression of sentiment.”. How? Please include a reference.

- Lines 281-283: “The analysis of our corpus showed that the articles marked with non-confidence had either below 25% of medical terminology, or above 40%.”. What kind of analysis did you run? Please explain.

- L. 298: “machine learning techniques”: Language-wise you can’t enrich a corpus with machine learning. You will probably use machine learning to add e.g. annotations. Which machine learning technique you think is applicable?

- Your article processes English texts. I) make it clear, II) How do you make sure that some initials or words with First letter capital are treated correctly?

- Which environments are you using? You only mention the POS tagger but I reckon, the readers would like to know more details, e.g. language and packages.

- Strictly speaking, you don’t have a running system to present it’s “architecture” as the title of section 4 denotes.

- L.272: “Oxford Academy”, please omit and just use the header you have.

- L. 258-262: I don’t believe you need to write the reasons behind using “sentiment analysis”.

Language: Overall, it is understandable but I have noticed many issues. I recommend the authors to use a language editor, or a colleague who can check.

Examples:

- l. 14 “seized” -> stopped

- l.20 ‘effectiveness’ -> effective

- l. 37 “betray” -> well, you betray an army or your country but not an ‘author’, use ‘reveal’ instead

- l. 78 don’t use “but”, use e.g. “however”

- l. 179, l.182 “showed” -> “shown”

- l. 180: “poses the accent”. What do you mean? Probably bad translation

- l. 275 “rage” -> “range”

Author Response

Thank you for your review and for taking the time to provide such a constructive feedback.

Our instrument for detecting author confidence is based on three pipelined modules: a lexical, a syntactic and a semantic one. The figures for the three modules were slightly modified in the paper, to make the required steps more clear. The lexical module tokenizes and lemmatizes the text, extracts frequencies of medical terms and sentence length. The syntactic module identifies part of speeches, than active/passive voice and 1^st/3^rd person. The semantic module analyses the overall sentiment of the text, i.e. its polarity (negative or positive). This information is combined with information regarding the number of views of the article, the number of citations per author and per paper, as well as number of cited papers, and a final decision regarding the confidence of the author is issued.

We used the malaria disease to create the database. The reason for selecting a specific disease was that we expect articles to be comparable in regard to the medical terms they use.

We changed the name of section 1 as suggested, and we also performed the modification suggested wrt. Inconsistences.

Author Response File: Author Response.pdf

Reviewer 3 Report

This work represents an attempt to identify the confidence factor of a biomedical article based on text analysis and to relate it to action people may take based on that. It is focused on a subgroup of biomedical articles related to malaria and the analysis is based on the way authors have presented their work. The methodology used is a know set of tools for similar analysis. However, the results are modestly presented without solid analysis or test set that would confirm the statements made by the authors. In this way, the results are left for interpretation by the readers and need another way to attest to the validity of the method. Without that it is hard to see the contribution of this work. However, I hope that it can be provided in the future studies.

Author Response

Thank you for your reviews and for taking the time to share with us your knowledge through constructive feedback.

Our instrument for detecting author confidence is based on three pipelined modules: a lexical, a syntactic and a semantic one. The figures for the three modules were slightly modified in the paper, to make the required steps more clear. The lexical module tokenizes and lemmatizes the text, extracts frequencies of medical terms and sentence length. The syntactic module identifies part of speeches, than active/passive voice and 1^st/3^rd person. The semantic module analyses the overall sentiment of the text, i.e. its polarity (negative or positive). This information is combined with information regarding the number of views of the article, the number of citations per author and per paper, as well as number of cited papers, and a final decision regarding the confidence of the author is issued.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Good now.

Author Response

Thank you for your reviews.

Reviewer 2 Report

The paper is better than the original submission but I can still see some weaknesses that need to be amended I believe, in order for the paper to be accepted. Below, I 'm listing my remarks.

Methodology: The paper says, l.202-204: "Each of the three main analysers (lexical, syntactic and semantic) returned a score for each article, and the final step involved the concatenation of the intermediate scores, with specific weights, in order to obtain the final result". I think this is very fuzzy. A reader, according to my opinion, would really anticipate in a paper like that to see HOW. This is only outlined in the paper. I suggest the authors to include a more detailed description, to show how weights were actually extracted and to present and discuss the actual formula that was used.

Language: I see improvements as far as the language is concerned but the paper still needs some work and a careful proofreading. Some points I noticed:

l. 163: "showed". Passive voice is still wrong, while it was corrected in l. 158

Sentence length: although the authors show in the paper that sentence length contributes to reduced clarity, they use long sentences, rather often, e.g. l.219-222 (one sentence!)

l. 28: “text analyses” -> “text analysis“

l. 41: “identify” -> “identifying”

l. 149: “understanding of the article has to suffer”(!!!)

l. 176: “in the same time” -> “at the same time”

Heading of section 7: “Discussions” -> “Discussion”

Slightly confusing measurement figures: I have understood what the figures show but it was unclear to me at first. The legend: "no. of articles" (Fig. 6 and 7) might make someone believe that the chart is displaying various measurements for a varying number of documents the author experimented with. The chart, displays though the figure the authors got, for each document in their collection. I would suggest the x-axis legend to be changed to something like “Article ID”. The x-axis legend of Fig. 8 is “Articles”, so it probably means “Article id”, too.

Experiment: The output scores for each paper are fine (figures 6-8) but I would like to see averages and standard deviation, too. What would also be interesting would be to comment, if possible, about the outliers, e.g. papers with sentiment -1.

lines 31-33 you provide an incentive for your work. Some existing bibliography would also be useful.

l. 84: You say the dataset contains 10.000 documents but in the figures we can see figures up to 12.000 something. Please be more specific.

Methodology: I think that it might be a good idea to write a subsection on implementation. The sentence about NLTK, seems a bit disconnected right now. There are some references that might need some more explanation; Why have you used the RACAI POS tagger and not something that NLTK uses? It seems too much to me, unless you have a good reason why. Why have you used Stanford Sentiment Analysis and not something else?

It would be nice to provide a link to a code repository with the source code you have created (as long as you are allowed to do that, by your funding authorities).

I would change Fig. 2 to show a paper as input and the decision of whether it is written in a confident tone as output.

l. 132 please discuss “various thresholds”

Have you used “stopword” removal in your “preprocessing” phase? Explain your choice.

I was thinking that since you analyse the number of citations of a paper that you could also check a network metric like PageRank. I think that PageRank can rank even a relatively new paper, high enough. Other wise it’s really unclear how you combine paper citation metadata, see my comment on formulas, too.

How do you determine empirically the weights of each feature, l.257-258. Please comment in your “methodology” instead of just referring into it at the Conclusions

l. 269: what do you mean “additional machine learning techniques”? Where have you used ML in your paper, in order to use an additional ML technique?

l. 300: Does reference 9 need a Journal title?

Author Response

Thank you for taking the time to review our paper and help us improve it through your comprehensive review.

Regarding the weights, we moved the discussion to Methodology, as suggested. We also included a paragraph detailing the actual weighs we used for each feature and how they combine to form the final score.

We made all suggested language improvements. Additionally, we also used „Article ID” in figures 6-8 for the x-axis, as suggested.

We corrected the errors which lead to reporting different number of article in our collection. We have almost 12000 collected documents.

We have chosen to use RACAI POS tagger instead of NLTK’s POS functionality since it facilitated the extraction of the verbal voice, a feature that we further use. We have used Stanford Sentiment Analysis for commodity, since we already used it in various previous work.

Regarding “stopword” removal, we have stated that functional words are removed by the lexical analyzer.

Regarding “additional machine learning techniques”, it was simply an error of using the adjective instead of the adverb additionally.

Reviewer 3 Report

Thank you for replacing the chapter titles, whith much more clear and usual ones. There was not major imporvement in the methodology from the previous version however the results are novel and I support it being published in the present form.

Author Response

Thank you for your reviews!

Article Menu

Towards Identifying Author Confidence in Biomedical Articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI