Previous Article in Journal
Perspectives for Generative AI-Assisted Art Therapy for Melanoma Patients
 
 
Project Report
Peer-Review Record

Enhancing Literature Review Efficiency: A Case Study on Using Fine-Tuned BERT for Classifying Focused Ultrasound-Related Articles

AI 2024, 5(3), 1670-1683; https://doi.org/10.3390/ai5030081
by Reanna K. Panagides 1,*, Sean H. Fu 2,*, Skye H. Jung 1, Abhishek Singh 1, Rose T. Eluvathingal Muttikkal 1, R. Michael Broad 2, Timothy D. Meakem 2 and Rick A. Hamilton 2
Reviewer 1:
Reviewer 2: Anonymous
AI 2024, 5(3), 1670-1683; https://doi.org/10.3390/ai5030081
Submission received: 7 August 2024 / Revised: 31 August 2024 / Accepted: 3 September 2024 / Published: 10 September 2024
(This article belongs to the Section AI Systems: Theory and Applications)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

General comments

=============

Thank you for the opportunity to review your manuscript titled "Enhancing Focused Ultrasound Literature Review Through Natural Language Processing-Driven Text Classification." The topic is highly relevant, especially given the growing intersection of medical research and machine learning. However, while the manuscript introduces an intriguing concept, it would benefit from substantial elaboration throughout to fully realize its scholarly potential. In particular, the methods section needs more detail to ensure the reproducibility and clarity of your research approach.

 

Specific comments

=============

Major comments

---------------------

- Overall Recommendation:

  I recommend using a standardized checklist, such as PRISMA or a similar framework, for conducting and reporting literature reviews. This will help ensure that your manuscript adheres to recognized standards of completeness and transparency, addressing several points raised in this review.

 

- Title:

  The title of your manuscript would be more informative if it clearly indicated the study type. For example, specifying whether this is a pilot study or a preliminary study would provide readers with an immediate understanding of the scope and scale of your research. Given that your study employs specific NLP methods like BERT, this clarification would align the title more closely with the content.

 

- Introduction:

  - The introduction should include a more detailed explanation of focused ultrasound (FUS), with particular attention to FUS therapy and FUS technology. This will help readers unfamiliar with the field to better understand the context and significance of your work.

  - Additionally, the relationship between machine learning (ML) and literature review processes, especially in the context of FUS, needs to be more thoroughly explained. This background will help set the stage for the research question you are addressing.

  - The knowledge gap and research question should be explicitly stated, with a focus on the scientific rationale for applying NLP to literature reviews in the FUS domain. This will strengthen the purpose of your study.

 

- Methods:

  - Please provide a clear and detailed explanation of your inclusion and exclusion criteria, supported by a PRISMA-like flowchart. This is crucial for understanding how you selected the articles for your dataset. For instance, the statement “we incorporated more of these articles into our dataset” requires clarification—what criteria were used to decide which articles to include?

  - Clarify whether your study adhered to the text-mining policies of all publishers from which articles were sourced. This is important for addressing potential ethical concerns.

  - Describe how you validated that the FUSF Excel list is accurately related to FUS therapy. This step is critical for ensuring that your dataset is relevant and reliable.

  - Provide a rationale for choosing the BERT model. Explain why this particular NLP method was selected over others and how it is suited to your research objectives.

 

- Results:

  - Begin the results section with a basic background of the articles included in the study. This will provide context for your findings and help readers understand the dataset's composition before diving into more detailed analyses.

 

- Discussion:

  - The assertion that “BERT models can efficiently automate the classification of scientific literature” should be tempered. This statement currently appears to be an over-interpretation based on a single, specialized field. Consider rephrasing to reflect the limitations of your findings, especially regarding their applicability to other fields.

  - Discuss potential ways to generalize your findings to other areas of research or literature review processes. This would add value by showing how your approach could be adapted or expanded beyond the FUS field.

  - Address the limitations of your methods, particularly the biases inherent in your article selection process and the decision not to employ other NLP methods. Discuss how these limitations affect the generalizability of your findings.

 

=============

Minor comments

---------------------

  - Please ensure that all abbreviations are defined when first introduced, both in the manuscript body and in the tables. This includes abbreviations used in the abstract.

Author Response

August 19th, 2024

 

Dear AI Editorial Office,

We are pleased to resubmit our revised manuscript with a new title, “Enhancing Literature Review Efficiency: A Case Study on Using Fine-Tuned BERT for Classifying Focused Ultrasound Related Articles” for consideration for publication in AI, on the special issue AI Systems: Theory and Applications. We sincerely appreciate both reviewers for their detailed and constructive feedback through this review.

In response to the reviewers’ comments, we have made major revisions to the manuscript. Below, we have provided a summary of the key changes made in accordance with the reviewers' comments. All other minor comments and suggestions have also been addressed. Please see below a detailed point-by-point response to the reviewers’ comments, alongside a highlighted version of the manuscript showing the changes made.

We believe that the revisions have significantly strengthened our manuscript, and we hope that it now meets the high standards for publication in the journal AI.

Sincerely,

 

Reanna Panagides

Sean Fu

Corresponding authors

 

 

 

Point-by-Point Response:

 

Reviewer #1

 

  1. The title of your manuscript would be more informative if it clearly indicated the study type. For example, specifying whether this is a pilot study or a preliminary study would provide readers with an immediate understanding of the scope and scale of your research. Given that your study employs specific NLP methods like BERT, this clarification would align the title more closely with the content.

 

Response: We agree with the reviewer that the current title leads to some confusion about the overall scope and methods of our study. We have revised the title to read as follows, “Enhancing Literature Review Efficiency: A Case Study on Using Fine-Tuned BERT for Classifying Focused Ultrasound Related Articles” (page 1, highlighted in red). We believe that this new title aligns more closely with our study, which proposed a more efficient way to exclude articles not relevant to a specific topic and how one may incorporate automation into the general literature review process.

 

  1. The introduction should include a more detailed explanation of focused ultrasound (FUS), with particular attention to FUS therapy and FUS technology. This will help readers unfamiliar with the field to better understand the context and significance of your work.

 

Response: Thank you for this good point. In this revision, we included a more detailed explanation about focused ultrasound therapy and technology in the introduction section 1.1 (page 1, paragraph 1, lines 29-39, highlighted in red).

 

  1. Additionally, the relationship between machine learning (ML) and literature review processes, especially in the context of FUS, needs to be more thoroughly explained. This background will help set the stage for the research question you are addressing.

 

Response: We agree with the reviewer and have included additional explanation in section 1.1 (page 2, paragraph 1, lines 50-56 and page 2, paragraph 2, lines 57-63, highlighted in red) on why machine learning lends itself to be specifically used in the context of focused ultrasound research due to traditional article identification techniques not being able to recognize the difference between focused ultrasound and diagnostic ultrasound studies.

 

  1. The knowledge gap and research question should be explicitly stated, with a focus on the scientific rationale for applying NLP to literature reviews in the FUS domain. This will strengthen the purpose of your study.

 

Response: We appreciate this suggestion. New paragraphs at the end of section 1.1 and at the beginning of section 1.2 (page 2, paragraph 5, lines 89-101 and page 3, paragraph 1, lines 104-107, highlighted in red) were included to address this comment. We discuss how there is no current machine learning model that exists publicly to classify text related to or not related to FUS thereby identifying a knowledge gap in the literature. These paragraph allow for a smoother transition between the body of the introduction and the main purpose of the study.

 

We also pointed out one limitation of this study. We fine-tuned a large language model and compared the performance of them to each other but did not then also synthesize knowledge from doing an actual literature review of the relevant articles. We apologize for not making it clear in the initial submission. This point is now clarified in a new paragraph at the end of section 1.2 (page 3, paragraph 2, lines 115-121, highlighted in red).

 

  1. Please provide a clear and detailed explanation of your inclusion and exclusion criteria, supported by a PRISMA-like flowchart. This is crucial for understanding how you selected the articles for your dataset. For instance, the statement “we incorporated more of these articles into our dataset” requires clarification—what criteria were used to decide which articles to include?

 

Response: An additional explanation was added in section 2.1 (page 5, paragraph 1, lines 188-201, highlighted in red) outlining the keyword criteria that was used to select for articles from the PubMed database. Such terms include “Focused Ultrasound” or “High Intensity Focused Ultrasound” to train the model on articles that are FUS related. However, other keywords, such as “ultrasound imaging” and “diagnostic ultrasound” were also included in the search even though they are not directly related to focused ultrasound. We now clarify that in order to balance the training dataset and to account for these similar articles that are not directly related to FUS, we incorporated additional papers that fell under those categories of imaging ultrasound and diagnostic ultrasound keyword search as classified by the clinical team (page 5, paragraph 3, lines 215-218, highlighted in red).

 

  1. Clarify whether your study adhered to the text-mining policies of all publishers from which articles were sourced. This is important for addressing potential ethical concerns.

 

Response: We added in a clarification statement in section 2.1 (page 5, paragraph 2, lines 205-206, highlighted in red) on obtaining data that all the text mining was in line with the policies from the publishers, that no personal or sensitive information was scraped or shared, and that all articles were open-source access.

 

  1. Describe how you validated that the FUSF Excel list is accurately related to FUS therapy. This step is critical for ensuring that your dataset is relevant and reliable.

 

Response: We have now clarified this point in section 2.1 (page 5, paragraph 2, lines 207-209, highlighted in red). Prior to the manual classification of Excel list articles into FUS-related or non-FUS related by the clinical team, an initial screen was performed by them to ensure that all articles were related to FUS therapy. Any unrelated articles were removed from the list and not included in the dataset. Only after that were the remaining articles manually classified and used to train the BERT model.

 

  1. Provide a rationale for choosing the BERT model. Explain why this particular NLP method was selected over others and how it is suited to your research objectives.

 

Response: In section 1.3 as well as section 2.3, we added further information clarifying our reasons for selecting specifically the BERT model (page 4, paragraph 1, lines 154-166 and page 7, paragraph 7, lines 277-280, highlighted in red). Transformer models are the current standard for analyzing text data, which is the type of data that we want to classify. The other models that we included on our initial list, such as CNNs or RNNs, are more specified to process other forms of data such as visual or audio. Thus, we deemed BERT models as the best fit for our dataset and application.

 

  1. Begin the results section with a basic background of the articles included in the study. This will provide context for your findings and help readers understand the dataset's composition before diving into more detailed analyses.

 

Response: We thank the reviewer for this suggestion. We have now included a short paragraph at the beginning of the result section (page 9, paragraph 4, lines 381-385, highlighted in red) to provide readers with the context of our findings. 

 

  1. The assertion that “BERT models can efficiently automate the classification of scientific literature” should be tempered. This statement currently appears to be an over-interpretation based on a single, specialized field. Consider rephrasing to reflect the limitations of your findings, especially regarding their applicability to other fields.

 

Response: We agree with the reviewer and have reworded this statement to more accurately reflect the scope and scale of our project, removing the overgeneralizations (page 12, paragraph 6, lines 479-486, highlighted in red). We acknowledge that BERT models show a promising path to expanding beyond FUS-related fields but clarify that ultimately further studies are needed to test such a claim.

 

  1. Discuss potential ways to generalize your findings to other areas of research or literature review processes. l)This would add value by showing how your approach could be adapted or expanded beyond the FUS field.

 

Response: Thank you for this suggestion. The wealth of data in so many different areas, particularly those related to biomedical and health sciences, only continues to grow. With a re-optimization of keywords to feed the web scraper, accessing those articles, regardless of the field, is very feasible. Those data can then be trained to these very reputable BERT models with fine tuning as necessary (page 13, paragraph 1, lunes 488-495, highlighted in red).

 

  1. Address the limitations of your methods, particularly the biases inherent in your article selection process and the decision not to employ other NLP methods. Discuss how these limitations affect the generalizability of your findings.

 

Response: The largest limitation of using BERT models for text classification is the amount of data needed to train the models for subject specific classification tasks. These limitations don’t necessarily affect the generalizability of our findings but rather make it difficult for individual researchers to obtain the amount of data required to train a model for their specific research purposes. This concept among other limitations identified was expanded upon in the discussion and conclusion section of the paper (page 13, paragraphs 2 and 3, lines 505-508 and 511-520, highlighted in red).

 

  1. Please ensure that all abbreviations are defined when first introduced, both in the manuscript body and in the tables. This includes abbreviations used in the abstract.

 

Response: Thank you for this feedback. This has been addressed.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The presented report on the project is of scientific interest because it considers a fairly understandable and relevant problem: the classification of scientific articles according to their belonging to a particular topic. In their work, the authors analyzed the classification on the topic of focused ultrasound, which requires certain knowledge from the researcher during the initial selection of articles for analysis. As a tool for automating the task being solved, the authors propose machine learning technologies, including models based on the BERT architecture. The authors presented an overview of the subject area and possible approaches to solving the task, provided metrics for evaluating models, described the general methodology of the study, conducted an experiment and presented its results. All the formal requirements for the article have been fulfilled, after all. However, after studying the work, I have several questions that are worth paying attention to. The article does not have the largest volume and, it seems to me, some of its sections should be described and outlined in more detail. 1. Table 1 shows a larger number of different architectures than will be used later in the experiment. I would like to see an explanation of why some architectures were dropped. 2. Section 3.2. needs to be described more broadly. Figure 1 shows only the general strategy, I would like to see examples of the implementation of the methodology, examples of using the model, successful and unsuccessful attempts to classify articles. Thus, the authors should focus on the presentation of their solution. 3. Perhaps it is worth placing more emphasis on achieving the goals set in section 4. 4. Many References are provided by links to the arXiv database. On the one hand, this is good, since these are relevant and new studies. But, on the other hand, these References have not yet been reviewed, verified and published. I think the authors need to add more References from well-known and peer-reviewed journals.

Author Response

August 19th, 2024

 

Dear AI Editorial Office,

We are pleased to resubmit our revised manuscript with a new title, “Enhancing Literature Review Efficiency: A Case Study on Using Fine-Tuned BERT for Classifying Focused Ultrasound Related Articles” for consideration for publication in AI, on the special issue AI Systems: Theory and Applications. We sincerely appreciate both reviewers for their detailed and constructive feedback through this review.

In response to the reviewers’ comments, we have made major revisions to the manuscript. Below, we have provided a summary of the key changes made in accordance with the reviewers' comments. All other minor comments and suggestions have also been addressed. Please see below a detailed point-by-point response to the reviewers’ comments, alongside a highlighted version of the manuscript showing the changes made.

We believe that the revisions have significantly strengthened our manuscript, and we hope that it now meets the high standards for publication in the journal AI.

Sincerely,

 

Reanna Panagides

Sean Fu

Corresponding authors

 

Reviewer #2

 

  1. Table 1 shows a larger number of different architectures than will be used later in the experiment. I would like to see an explanation of why some architectures were dropped.

 

Response: It is important to note that although all mentioned methods in Table 1 have been used for text classification, the invention of transformers, such as Bidirectional Encoder Representations from Transformers (BERT), have been shown to significantly outperform all other types of traditional and deep learning methods for this NLP task. Although fine-tuning BERT models for text classification requires more computation power than training traditional models, we are interested in seeing how the performance of fine-tuned BERT models compare. We decided to only investigate the performance of transformers, over CNNs and RNNs, due to the literature supporting the use of transformers for text classification tasks over other deep learning methods. This explanation has been added in section 1.3 and 2.3 (under Model Selection and Training, page 4, paragraph 1, lines 154-166 and page 7, paragraphs 1 and 2, lines 272-273 and 277-280, highlighted in red).

 

  1. Section 3.2. needs to be described more broadly. Figure 1 shows only the general strategy. I would like to see examples of the implementation of the methodology, examples of using the model, successful and unsuccessful attempts to classify articles. Thus, the authors should focus on the presentation of their solution.

 

Response: Thank you for this feedback. We have added a new section 3.3 (pages 10, 11, and 12, highlighted in red) that provides an example use-case/walk through of using this ML-assisted literature review workflow along with examples of successful attempts to classify articles.

 

  1. Perhaps it is worth placing more emphasis on achieving the goals set in section 4.

 

Response: We agree with the reviewer and have reorganized some sentences between our results section and our discussion section. The results section now presents purely the data regarding the comparison of evaluation metrics for models and the process of integrating those selected models into the literature review process. Statements about choosing a model that aligns well with our initial goals and integrating that model successfully into the literature review section have now been moved to the beginning of the discussion section (page 12, paragraph 5, lines 473-478, highlighted in red), enabling better flow as the discussions begin with accomplishments of our goals, followed by the limitations and future directions.

 

  1. Many References are provided by links to the arXiv database. On the one hand, this is good, since these are relevant and new studies. But, on the other hand, these References have not yet been reviewed, verified and published. I think the authors need to add more References from well-known and peer-reviewed journals.

 

Response: Thank you for this suggestion. 13 additional peer-reviewed articles from well-recognized journals have been added to the references in this revised manuscript (pages 14 and 15, highlighted in red).

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

General comments

=============

Thank you for the opportunity to review your manuscript titled "Enhancing Focused Ultrasound Literature Review Through Natural Language Processing-Driven Text Classification." The topic is highly relevant, especially given the growing intersection of medical research and machine learning. Almost all responses were reasonable.

Reviewer 2 Report

Comments and Suggestions for Authors

My comments have been taken into account, and the answers to them are quite detailed.

Back to TopTop