Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon-Enhanced BERT and Contrastive Learning

Appl. Sci. 2024, 14(16), 6944; https://doi.org/10.3390/app14166944

by Cheng Peng^1,2,3

, Xiajun Wang^1,4, Qifeng Li^1,2,3,*, Qinyang Yu^1,2,3, Ruixiang Jiang^1,2,3, Weihong Ma^1,2,3

, Wenbiao Wu^1,2,3, Rui Meng^1,2,3, Haiyan Li^1,2,3, Heju Huai^1,2,3, Shuyan Wang^1,2,3 and Longjuan He⁵

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2024, 14(16), 6944; https://doi.org/10.3390/app14166944

Submission received: 22 July 2024 / Revised: 4 August 2024 / Accepted: 6 August 2024 / Published: 8 August 2024

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications—2nd Edition)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The article is devoted to solving one of the applied tasks of Named Entity Recognition using BERT. The topic of the article is relevant. The structure of the article does not conform to the format accepted in MDPI for research articles (Introduction (including analysis of analogues), Models and Methods, Results, Discussion, Conclusions). The level of English is acceptable. The article is easy to read. The figures in the article are of acceptable quality. The article cites 40 sources, some of which are outdated.
The following remarks and recommendations can be made regarding the material of the article:
1. The solution of the NER task begins with data preparation. Attention should be paid to potential problems such as annotator errors and ambiguous annotations. Additionally, some types of entities occur much less frequently than others, which creates an imbalance in the data. This may cause the model to perform worse in recognizing rare entities. Finally, it should be noted that many characters have very similar shapes and may have different meanings depending on the context, which complicates their correct recognition as entities. Please explain how these aspects of data preparation were taken into account by the authors.
2. BERT has many advantages, but when training it, two possible problematic points should be considered, namely, overfitting (the model can easily overfit the training data, especially if it is unbalanced or does not represent all possible variations) and difficulties in adaptation (adapting the pre-trained BERT model to specific tasks and domains requires careful tuning of hyperparameters and often additional methods such as regularization to avoid overfitting and improve the model's generalization ability). How do the authors address these issues?
3. In their model, the authors use contrastive learning. This is progressive but associated with several challenges. Effective contrastive learning depends on the correct selection of positive and negative pairs (or triplets). Incorrect selection can lead to poor convergence and low model efficiency. It should be noted that finding high-quality and diverse negative examples is difficult, especially in large and complex datasets. If the negative examples are too similar to the positive ones, learning can be ineffective. However, if the model focuses too much on negative examples, this can lead to overfitting and a decrease in overall performance. How do the authors take these points into account?

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

Authors:

Thank you for conducting this study, as you address a critical need in the field of pig science and applications of language processing and machine learning to better understand medical diagnostics/diagnoses.

Although your study is well written and contains a robust methods and discussion section, your study would be better understood if the introduction included potential use-cases or gaps in practice that pig scientists and non-experts could comprehend. For example, your very first paragraph outlines the problem of practice but no potential solutions that your study would address if successful. Here, please expand your introduction to address the problem and outline potential solutions and developments if your study was to be conducted. Then, transition to your study and explain that your work will fill gaps in the literature.

Otherwise, this study stands to make a fine contribution to the field.

Comments on the Quality of English Language

Only minor editing necessary.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I have formulated the following comments on the previous version of the article:

1. The solution of the NER task begins with data preparation. Attention should be paid to potential problems such as annotator errors and ambiguous annotations. Additionally, some types of entities occur much less frequently than others, which creates an imbalance in the data. This may cause the model to perform worse in recognizing rare entities. Finally, it should be noted that many characters have very similar shapes and may have different meanings depending on the context, which complicates their correct recognition as entities. Please explain how these aspects of data preparation were taken into account by the authors.
2. BERT has many advantages, but when training it, two possible problematic points should be considered, namely, overfitting (the model can easily overfit the training data, especially if it is unbalanced or does not represent all possible variations) and difficulties in adaptation (adapting the pre-trained BERT model to specific tasks and domains requires careful tuning of hyperparameters and often additional methods such as regularization to avoid overfitting and improve the model's generalization ability). How do the authors address these issues?
3. In their model, the authors use contrastive learning. This is progressive but associated with several challenges. Effective contrastive learning depends on the correct selection of positive and negative pairs (or triplets). Incorrect selection can lead to poor convergence and low model efficiency. It should be noted that finding high-quality and diverse negative examples is difficult, especially in large and complex datasets. If the negative examples are too similar to the positive ones, learning can be ineffective. However, if the model focuses too much on negative examples, this can lead to overfitting and a decrease in overall performance. How do the authors take these points into account?

The authors have addressed all my comments. I found their responses quite convincing. I support the publication of the current version of the article. I wish the authors creative success.

Article Menu

A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon-Enhanced BERT and Contrastive Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI