A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon-Enhanced BERT and Contrastive Learning
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe article is devoted to solving one of the applied tasks of Named Entity Recognition using BERT. The topic of the article is relevant. The structure of the article does not conform to the format accepted in MDPI for research articles (Introduction (including analysis of analogues), Models and Methods, Results, Discussion, Conclusions). The level of English is acceptable. The article is easy to read. The figures in the article are of acceptable quality. The article cites 40 sources, some of which are outdated.
The following remarks and recommendations can be made regarding the material of the article:
1. The solution of the NER task begins with data preparation. Attention should be paid to potential problems such as annotator errors and ambiguous annotations. Additionally, some types of entities occur much less frequently than others, which creates an imbalance in the data. This may cause the model to perform worse in recognizing rare entities. Finally, it should be noted that many characters have very similar shapes and may have different meanings depending on the context, which complicates their correct recognition as entities. Please explain how these aspects of data preparation were taken into account by the authors.
2. BERT has many advantages, but when training it, two possible problematic points should be considered, namely, overfitting (the model can easily overfit the training data, especially if it is unbalanced or does not represent all possible variations) and difficulties in adaptation (adapting the pre-trained BERT model to specific tasks and domains requires careful tuning of hyperparameters and often additional methods such as regularization to avoid overfitting and improve the model's generalization ability). How do the authors address these issues?
3. In their model, the authors use contrastive learning. This is progressive but associated with several challenges. Effective contrastive learning depends on the correct selection of positive and negative pairs (or triplets). Incorrect selection can lead to poor convergence and low model efficiency. It should be noted that finding high-quality and diverse negative examples is difficult, especially in large and complex datasets. If the negative examples are too similar to the positive ones, learning can be ineffective. However, if the model focuses too much on negative examples, this can lead to overfitting and a decrease in overall performance. How do the authors take these points into account?
Author Response
Please see the attachment
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsAuthors:
Thank you for conducting this study, as you address a critical need in the field of pig science and applications of language processing and machine learning to better understand medical diagnostics/diagnoses.
Although your study is well written and contains a robust methods and discussion section, your study would be better understood if the introduction included potential use-cases or gaps in practice that pig scientists and non-experts could comprehend. For example, your very first paragraph outlines the problem of practice but no potential solutions that your study would address if successful. Here, please expand your introduction to address the problem and outline potential solutions and developments if your study was to be conducted. Then, transition to your study and explain that your work will fill gaps in the literature.
Otherwise, this study stands to make a fine contribution to the field.
Comments on the Quality of English LanguageOnly minor editing necessary.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI have formulated the following comments on the previous version of the article:
1. The solution of the NER task begins with data preparation. Attention should be paid to potential problems such as annotator errors and ambiguous annotations. Additionally, some types of entities occur much less frequently than others, which creates an imbalance in the data. This may cause the model to perform worse in recognizing rare entities. Finally, it should be noted that many characters have very similar shapes and may have different meanings depending on the context, which complicates their correct recognition as entities. Please explain how these aspects of data preparation were taken into account by the authors.
2. BERT has many advantages, but when training it, two possible problematic points should be considered, namely, overfitting (the model can easily overfit the training data, especially if it is unbalanced or does not represent all possible variations) and difficulties in adaptation (adapting the pre-trained BERT model to specific tasks and domains requires careful tuning of hyperparameters and often additional methods such as regularization to avoid overfitting and improve the model's generalization ability). How do the authors address these issues?
3. In their model, the authors use contrastive learning. This is progressive but associated with several challenges. Effective contrastive learning depends on the correct selection of positive and negative pairs (or triplets). Incorrect selection can lead to poor convergence and low model efficiency. It should be noted that finding high-quality and diverse negative examples is difficult, especially in large and complex datasets. If the negative examples are too similar to the positive ones, learning can be ineffective. However, if the model focuses too much on negative examples, this can lead to overfitting and a decrease in overall performance. How do the authors take these points into account?
The authors have addressed all my comments. I found their responses quite convincing. I support the publication of the current version of the article. I wish the authors creative success.