Next Article in Journal
A Deep Learning-Based Acoustic Signal Analysis Method for Monitoring the Distillation Columns’ Potential Faults
Previous Article in Journal
Power Consumption Comparison of GPU Linear Solvers for Cellular Potts Model Simulations
 
 
Article
Peer-Review Record

MédicoBERT: A Medical Language Model for Spanish Natural Language Processing Tasks with a Question-Answering Application Using Hyperparameter Optimization

Appl. Sci. 2024, 14(16), 7031; https://doi.org/10.3390/app14167031 (registering DOI)
by Josué Padilla Cuevas 1, José A. Reyes-Ortiz 2, Alma D. Cuevas-Rasgado 1,*, Román A. Mora-Gutiérrez 2 and Maricela Bravo 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2024, 14(16), 7031; https://doi.org/10.3390/app14167031 (registering DOI)
Submission received: 5 July 2024 / Revised: 7 August 2024 / Accepted: 7 August 2024 / Published: 10 August 2024
(This article belongs to the Special Issue Techniques and Applications of Natural Language Processing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

A brief summary

The paper presents the process and outcomes of customizing a large pretrained language model from the general domain to the medical domain. Process of tuning this model is also described.

The aim of this paper is to examine the feasibility of adapting an LLM to a specific domain despite having been pretrained on general texts such as English Wikipedia and Google Books.

The main contribution is the MédicoBERT, a large language model adapted to the medical domain in Spanish.

Broad comments

1) Since the text is in English, the examples given in Spanish should also be supplemented with English translations.
2) Too much attention was paid to tuning hyperparameters. The article should present the selected parameters, but an entire subsection should not necessarily be devoted to it.

Specific comments:

l. 54 - the PLN abbreviation has not been explained before  
l. 134 - the PLN abbreviation used again
l. 139 - in Table 1 is used ALBERTO while here is used ALBERTo
l. 194 - the PLN abbreviation used again
l. 307 - the equation (1) is unclear. What is noted as P()
l. 460 - the PLN abbreviation used again

Author Response

We appreciate your time spent reviewing the manuscript. As an attached file, you will find the response to your comments.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors


Comments for author File: Comments.docx

Author Response

We appreciate your time spent reviewing the manuscript. As an attached file, you will find the response to your comments.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

MédicoBERT: A medical language model for Spanish NLP tasks with a Question-Answering Application using hyperparameter optimization

 

 

The author introduces a pre trained model specifically designed for Spanish medical natural language processing tasks - MédicoBERT. This model fine tunes the BERT model to adapt to medical terminology and related vocabulary, particularly in areas such as diseases, treatments, symptoms, and medications. MédicoBERT was pre trained on 3 million medical texts, containing 1.1 billion words. The paper provides a detailed description of the training and hyperparameter optimization process of the model, and fine tunes it on a corpus of over 34000 Spanish medical questions and answers to evaluate its performance in medical question answering tasks.

 

Advantages

Domain specific pre training model: MédicoBERT is specifically designed for Spanish medical NLP tasks. This domain specific pre training model significantly improves the performance of related tasks and fills the gap in the application of existing general models in this field.

 

Hyperparameter optimization: The paper describes the hyperparameter optimization process of the model, including two methods of using heuristic methods and nonlinear regression models for hyperparameter optimization and model calibration, which improves the efficiency and performance of the model.

 

Rich dataset usage: The author used multiple large medical datasets, including BioAsq, CORD-19, and Cowese, which cover a large amount of Spanish medical literature and enhance the semantic understanding ability of the model.

 

Shortcomings

Model comparison: Although the paper demonstrates the excellent performance of MédicoBERT  on multiple tasks, the comparison with other existing Spanish medical NLP models is not comprehensive enough. A more detailed comparative analysis can more clearly demonstrate the advantages of MédicoBERT.

 

The discussion on practical application scenarios is not in-depth enough: Although the model performs well in question answering tasks, the details and potential challenges of its application in actual medical scenarios are not sufficient. Further case studies and practical application feedback will help validate the practicality and stability of the model.

 

Insufficient analysis of model errors and limitations: The paper lacks a detailed analysis of the possible errors and their causes in the model during the task. For example, there was no discussion on the performance or reasons for the failure of MédicoBERT when dealing with specific types of problems. This analysis can help identify the limitations of the model. In addition, although the optimization process is detailed, there is no in-depth exploration of the specific impact of different hyperparameter configurations on model performance and the challenges in the optimization process.

Comments on the Quality of English Language

MédicoBERT: A medical language model for Spanish NLP tasks with a Question-Answering Application using hyperparameter optimization

 

 

The author introduces a pre trained model specifically designed for Spanish medical natural language processing tasks - MédicoBERT. This model fine tunes the BERT model to adapt to medical terminology and related vocabulary, particularly in areas such as diseases, treatments, symptoms, and medications. MédicoBERT was pre trained on 3 million medical texts, containing 1.1 billion words. The paper provides a detailed description of the training and hyperparameter optimization process of the model, and fine tunes it on a corpus of over 34000 Spanish medical questions and answers to evaluate its performance in medical question answering tasks.

 

Advantages

Domain specific pre training model: MédicoBERT is specifically designed for Spanish medical NLP tasks. This domain specific pre training model significantly improves the performance of related tasks and fills the gap in the application of existing general models in this field.

 

Hyperparameter optimization: The paper describes the hyperparameter optimization process of the model, including two methods of using heuristic methods and nonlinear regression models for hyperparameter optimization and model calibration, which improves the efficiency and performance of the model.

 

Rich dataset usage: The author used multiple large medical datasets, including BioAsq, CORD-19, and Cowese, which cover a large amount of Spanish medical literature and enhance the semantic understanding ability of the model.

 

Shortcomings

Model comparison: Although the paper demonstrates the excellent performance of MédicoBERT  on multiple tasks, the comparison with other existing Spanish medical NLP models is not comprehensive enough. A more detailed comparative analysis can more clearly demonstrate the advantages of MédicoBERT.

 

The discussion on practical application scenarios is not in-depth enough: Although the model performs well in question answering tasks, the details and potential challenges of its application in actual medical scenarios are not sufficient. Further case studies and practical application feedback will help validate the practicality and stability of the model.

 

Insufficient analysis of model errors and limitations: The paper lacks a detailed analysis of the possible errors and their causes in the model during the task. For example, there was no discussion on the performance or reasons for the failure of MédicoBERT when dealing with specific types of problems. This analysis can help identify the limitations of the model. In addition, although the optimization process is detailed, there is no in-depth exploration of the specific impact of different hyperparameter configurations on model performance and the challenges in the optimization process.

Author Response

We appreciate your time spent reviewing the manuscript. As an attached file, you will find the response to your comments.

Author Response File: Author Response.pdf

Back to TopTop