Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Usage of the Term Big Data in Biomedical Publications: A Text Mining Approach

Big Data Cogn. Comput. 2019, 3(1), 13; https://doi.org/10.3390/bdcc3010013

by Allard J. van Altena^*

, Perry D. Moerland, Aeilko H. Zwinderman and Sílvia Delgado Olabarriaga

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Big Data Cogn. Comput. 2019, 3(1), 13; https://doi.org/10.3390/bdcc3010013

Submission received: 16 January 2019 / Revised: 30 January 2019 / Accepted: 1 February 2019 / Published: 6 February 2019

Round 1

Reviewer 1 Report

This manuscript deals with exploring the value of the term «Big Data» when it is included or not in biomedical publications, with data retrieved from PubMed and PubMed Central. It is a very interesting topic, especially as Big Data research has taken the upturn in Health Informatics over the past years. However, some issues need to be addressed before this manuscript can be reconsidered for publication.

The literature review needs to be enhanced by more relevant citation on the subject; 19 references are not enough.

In addition, establish the connection between this work and previously published work in the Journal of “ Big Data and Cognitive Computing”.

Place the sections “Corpus Collection” and “Dataset Preparation” as subsections under the heading “Data and Methods”.

Use the PRISMA diagram (guidelines) in order to depict how the publications were selected, and replace Figure 1.

The section “Classification” should be renamed to “Result.

Avoid the use of bold font unless absolutely necessary.

Make the figrue captions more concise, and include the rest of the information in the main body of the manuscript.

Fix Figure 6; there is an overlapping of the value and the year in the interrsection of the x and y axes.

Author Response

Point 1: The literature review needs to be enhanced by more relevant citation on the subject; 19 references are not enough.

Response: We have reiterated the search for related work and added relevant citations to both the introduction and related work sections.

Point 2: In addition, establish the connection between this work and previously published work in the Journal of “Big Data and Cognitive Computing”.

Response: We explicitly searched for related work in this journal, which delivered one paper that was added.

Point 3: Place the sections “Corpus Collection” and “Dataset Preparation” as subsections under the heading “Data and Methods”.

Response: We have added the section heading as suggested. “Corpus Collection” and “Dataset Preparation” were changed into subsections and minor text edits were made to reflect this change.

Point 4: Use the PRISMA diagram (guidelines) in order to depict how the publications were selected, and replace Figure 1.

Response: We applied the PRISMA guidelines[1] to the publication inclusion diagram. However, we had to make some adjustments to the text, as the guidelines are not fully applicable to our kind of research.

Point 5: The section “Classification” should be renamed to “Result.

Response: We have adjusted the section heading and made some minor adjustments to the text to reflect this change.

Point 6: Avoid the use of bold font unless absolutely necessary.

Response: Bold font was mostly used to denote paragraphs. Where necessary we changed the bold font to a paragraph heading. Paragraph headings that weren’t absolutely necessary were removed from the text.

Point 7: Make the figure captions more concise, and include the rest of the information in the main body of the manuscript.

Response: Thank you for your comment, we agree that there were quite extensive captions which repeated the main body. We have adjusted the captions of figures 1, 3, and 5 to remove the superfluous text.

Point 8: Fix Figure 6; there is an overlapping of the value and the year in the intersection of the x and y axes.

Response: Thanks for this observation - we have adjusted the y-axis such that the labels do not intersect anymore.

[1] Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. Journal of Clinical Epidemiology 2009, 62, 1006 – 1012. doi:https://doi.org/10.1016/j.jclinepi.2009.06.005.

Reviewer 2 Report

In this paper, the authors proposed a text mining approach based on the term big data in biomedical publications. The authors generated 100 classifiers that could correctly distinguish between big data and non-big data documents. This paper would be interesting to the readers of BDCC. The following comments will be provided to the authors.

(1) The abbreviation of ROC in the abstract should be presented.

(2) In Section 3, 3000/12/31 can best be change to 2018/05/13.

(3) In Section 5, “Note that the ROC curves and FOR curve do not show trends along time.” However, the trend of big data may be expected in the future. The authors are suggested to state this issue.

(4) In Section 6, the limitation to the approach may be described in a new Sub-section.

(5) The conclusion section should be improved to highlight the novelty of this paper.

Author Response

Point 1: The abbreviation of ROC in the abstract should be presented.

Response: Thanks for this comment. We have explained the term in the first mention of ROC in the abstract.

Point 2: In Section 3, 3000/12/31 can best be change to 2018/05/13.

Response: In the paper we report the search parameters that were actually used in our experiments, which correspond to the default settings in PubMed when no end date is given.

It is true that it would have been more precise to limit the search to the date when the search was performed, however we had chosen not to. Therefore, we feel that the current method description is correct, so we chose not to edit this date. We did however add a sentence to section 3.1.1 to clarify this:

“Note that 3000/12/31 is the default value that PubMed uses when no end date is given.”

Point 3: In Section 5, “Note that the ROC curves and FOR curve do not show trends along time.” However, the trend of big data may be expected in the future. The authors are suggested to state this issue.

Response: Thanks for this comment. We realize that our argument was not clear. There is a growing trend to use the term, but distinguishability over time.

For clarification, we added the following sentence to Section 5:

“There is a clear increase in the usage of the term Big Data in biomedical literature along time. Here however our focus is in changes over time regarding distinguishably between papers that use the term and that do not.”

Point 4: In Section 6, the limitation to the approach may be described in a new Sub-section.

Response: We have consolidated all described limitations under a new subsection and placed this at the end of the discussion section.

Point 5: The conclusion section should be improved to highlight the novelty of this paper.

Response: Thanks for this suggestion. We have added the following to the conclusion:

“In this research we investigated the question whether Big Data literature in the biomedical field can be distinguished from literature that does not use the term. To our best knowledge, this is the first study to analyse this question using quantitative methods in this research field.”

Round 2

Reviewer 1 Report

The authors have addressed all comments from the previous review round, and the manuscript has been significantly improved. The only minor issue I see, is that the Results section is only 10 lines. Either merge with Discussion (i.e. Results and Discussion), or include more relevant information in this section, as, for example, from section 3.3. Classification.

Article Menu

Usage of the Term Big Data in Biomedical Publications: A Text Mining Approach

Further Information

Guidelines

MDPI Initiatives

Follow MDPI