Next Article in Journal
Specialized Genetic Operators for the Planning of Passive Optical Networks
Previous Article in Journal
Assessing Large Language Models Used for Extracting Table Information from Annual Financial Reports
Previous Article in Special Issue
Deep Learning for Predicting Attrition Rate in Open and Distance Learning (ODL) Institutions
 
 
Article
Peer-Review Record

Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu

Computers 2024, 13(10), 258; https://doi.org/10.3390/computers13100258
by Fida Ullah, Alexander Gelbukh *, Muhammad Tayyab Zamir, Edgardo Manuel Felipe Riverá½¹n and Grigori Sidorov
Reviewer 1: Anonymous
Reviewer 2:
Computers 2024, 13(10), 258; https://doi.org/10.3390/computers13100258
Submission received: 31 July 2024 / Revised: 22 September 2024 / Accepted: 30 September 2024 / Published: 10 October 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper presents a significant contribution to Named Entity Recognition (NER) in low-resource languages, specifically Urdu. The study explores the challenges of NER in Urdu and proposes data augmentation techniques combined with BERT models to improve NER performance.

 

Strengths

1.   Novelty  : The use of data augmentation to expand the Urdu NER dataset is innovative and addresses the critical issue of limited training data.

2.   Comprehensive Evaluation  : The paper evaluates multiple transformer models (BERT Multilingual, Roberta Urdu Small, BERT Base Case, and BERT Large Case) and compares their performances, providing a thorough analysis.

3.   Significant Results  : The reported improvement in NER performance, particularly the highest macro F1 score of 0.982 achieved by the BERT Multilingual model, demonstrates the effectiveness of the proposed approach.

4.   Detailed Methodology  : The methodology section is well-detailed, describing the data augmentation process, model training, and evaluation metrics clearly.

 

Areas for Improvement

1.   Clarity and Organization  :

   -   Introduction and Related Work  : The introduction could be more concise, focusing on the key contributions of the paper. The related work section could be better integrated to highlight how the proposed work builds on and differs from existing research.

   -   Segmentation  : The paper can benefit from clearer segmentation between different sections. For example, separating the description of challenges faced in Urdu NER and the proposed solutions into distinct sections could improve readability.

 

2.   Technical Details  :

   -   Data Augmentation Process  : While the data augmentation technique is described, the specific steps and algorithms used for the augmentation could be detailed further. Including pseudo-code or an algorithmic flowchart might help readers replicate the process.

   -   Model Parameters and Training Details  : More information on the hyperparameters, training duration, and computational resources used would provide a clearer picture of the experiment setup.

 

3.   Evaluation and Comparison  :

   -   Baseline Comparisons  : The paper could include a comparison with more baseline models or traditional approaches to further highlight the improvements made by the proposed method.

   -   Statistical Significance  : Including statistical tests to verify the significance of the improvements in performance would strengthen the claims made.

 

4.   Discussion and Implications  :

   -   Broader Implications  : While the discussion focuses on the results, it could be expanded to include broader implications for NER in other low-resource languages. Discussing potential limitations and future work would provide a more rounded perspective.

   -   Error Analysis  : An analysis of common errors made by the models could provide insights into areas where further improvements are needed.

 

5.   Figures and Tables  :

   -   Figure Clarity  : Some figures, such as the architecture diagram and results comparison chart, could be made more legible with better labeling and higher resolution. Figure 1 and 2 must be in English.

   -   Additional Visualizations  : Including confusion matrices or detailed breakdowns of performance across different entity types could provide a more granular view of the model’s strengths and weaknesses.

 

6.   Writing Style  :

   -   Language and Grammar  : Some sections of the paper have minor grammatical errors and awkward phrasing. A thorough proofreading would help enhance clarity and professionalism.

   -   Conciseness  : Certain parts of the paper could be made more concise to avoid redundancy and improve the overall flow of the manuscript.

Comments on the Quality of English Language

1.   Technical Terminology  : The paper uses appropriate technical terms related to Natural Language Processing (NLP) and Named Entity Recognition (NER), demonstrating a good command of subject-specific language.

2.   Detailed Descriptions  : The authors provide detailed explanations of their methodology and results, which helps in understanding the technical aspects of their research.

 

     Areas for Improvement:

 

1.   Grammar and Syntax  :

   -   Run-On Sentences  : Some sentences are lengthy and contain multiple ideas, making them difficult to read. For example:

     - "The obtained dataset underwent cleaning procedures employing various techniques, including the removal of stop words, commas, and semi-colons."

     - Suggested revision: "The dataset was cleaned using various techniques, such as removing stop words, commas, and semi-colons."

   -   Sentence Structure  : Varying sentence structure can enhance readability. For example:

     - "NER minimizes both time and energy consumption levels in the identification process."

     - Suggested revision: "NER reduces both the time and energy required for the identification process."

 

2.   Punctuation  :

   -   Comma Usage  : Some sentences lack necessary commas, which can lead to confusion. For example:

     - "In this research we present an enhanced NER system for the Urdu script by leveraging multilingual BERT and introducing a novel data augmentation technique known as Contextual Word Embedding's Augmentation (CWEA)."

     - Suggested revision: "In this research, we present an enhanced NER system for the Urdu script by leveraging multilingual BERT and introducing a novel data augmentation technique known as Contextual Word Embedding's Augmentation (CWEA)."

   -   Apostrophes  : Correct the misuse of apostrophes. For example:

     - "Contextual Word Embedding's Augmentation (CWEA)"

     - Correct usage: "Contextual Word Embeddings Augmentation (CWEA)"

 

3.   Word Choice  :

   -   Technical Precision  : Ensure that technical terms are used precisely. For example:

     - "We were used augmentation method to increase the amount of text."

     - Suggested revision: "We used a data augmentation method to increase the amount of text."

   -   Avoiding Redundancy  : Avoid repeating the same word or phrase unnecessarily. For example:

     - "The study explores the challenges of NER in Urdu and proposes data augmentation techniques combined with BERT models to improve NER performance."

     - Suggested revision: "The study explores the challenges of NER in Urdu and proposes data augmentation techniques combined with BERT models to enhance performance."

 

4.   Clarity and Conciseness  :

   -   Simplifying Complex Sentences  : Break down complex sentences to make them easier to understand. For example:

     - "The results indicate a notable enhancement in the NER system's overall performance when utilizing the extended dataset."

     - Suggested revision: "The results show a significant improvement in the NER system's performance with the extended dataset."

   -   Avoiding Ambiguity  : Ensure that each sentence conveys a clear and specific idea. For example:

     - "NER provides various functions for the Urdu language processing system."

     - Suggested revision: "NER enhances the efficiency of the Urdu language processing system by accurately identifying and categorizing named entities."

 

5.   Proofreading  :

   -   Consistency  : Ensure consistent use of terminology and abbreviations throughout the paper.

   -   Typographical Errors  : A thorough proofreading can help catch typographical errors and improve the overall quality of the manuscript.

 

     Conclusion

The paper demonstrates a solid understanding of the subject matter and uses appropriate technical terminology. However, improving sentence structure, grammar, punctuation, and overall clarity would enhance readability and make the paper more professional. A thorough proofreading and revision process is recommended to address these issues.

Author Response

Dear Editor,

 

Thank You for providing valuable comments to revise our article. We are thankful to the reviewers for their time and effort, as the revised version of the manuscript has been substantially improved with their suggestions. Changes have been highlighted in Track Changes in the revised version. Our point-to-point response to the reviewers’ comments and suggestions is given below.  (Manuscript ID: computers-3161738)

 

 

Manuscript title: Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu

 

                            (Changes in the manuscript are marked with track changes.)

As a response to the comments on the manuscript, “Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu.” the comments and suggestions made by the reviewers were exceptionally valuable, as they were not only relevant but also played a substantial role in enhancing the quality of our manuscript. Please find the answers to the comments.

 

Reviewer Comments:

Reviewer:1
Areas for Improvement

  Clarity and Organization:

   -   Introduction and Related Work: The introduction could be more concise, focusing on the key contributions of the paper. The related work section could be better integrated to highlight how the proposed work builds on and differs from existing research.

Reviewer comment: Segmentation: The paper can benefit from clearer segmentation between different sections. For example, separating the description of challenges faced in Urdu NER and the proposed solutions into distinct sections could improve readability.

Author’s Response: Dear editor we have arranged all sections and subsection by given the numbers acurately for better readability.

Reviewer comment: Technical Details:

   -   Data Augmentation Process: While the data augmentation technique is described, the specific steps and algorithms used for the augmentation could be detailed further. Including pseudo-code or an algorithmic flowchart might help readers replicate the process.

Author’s Response: Dear editor, As per your request, we have added both steps algorithms and pseudo-code used for the augmentation in supplementary file.

Reviewer comment: Model Parameters and Training Details: More information on the hyperparameters, training duration, and computational resources used would provide a clearer picture of the experiment setup.

Author’s Response: Worthy Reviewer, we have corrected and addressed the comment as per your request.

Reviewer comment: Baseline Comparisons: The paper could include a comparison with more baseline models or traditional approaches to further highlight the improvements made by the proposed method. Statistical Significance: Including statistical tests to verify the significance of the improvements in performance would strengthen the claims made..

Author’s Response: Dear reviewer, we have addressed the comments as per your sugestions and mentioned in the revised version.

Reviewer comment: Discussion and Implications:

 Broader Implications: While the discussion focuses on the results, it could be expanded to include broader implications for NER in other low-resource languages. Discussing potential limitations and future work would provide a more rounded perspective.

Error Analysis: An analysis of common errors made by the models could provide insights into areas where further improvements are needed.

Author’s Response: Dear editor, we have expended the broader impications for NER in other low source languages.we also added the potential limitation and future work recommendation in the Discussion section. Moreover, we have adressed the issue as per your comments in the Error analysis section.

Reviewer comment: Figures and Tables:

  Figure Clarity: Some figures, such as the architecture diagram and results comparison chart, could be made more legible with better labeling and higher resolution. Figure 1 and 2 must be in English. Additional Visualizations: Including confusion matrices or detailed breakdowns of performance across different entity types could provide a more granular view of the model’s strengths and weaknesses.

Author’s Response: Thank you for your sugestions, we have addressed all the grametical mistakes in the revised version. Also we have concised the manuscript for better readability.

Reviewer comment: Comments on the Quality of English Language

  1. Technical Terminology: The paper uses appropriate technical terms related to Natural Language Processing (NLP) and Named Entity Recognition (NER), demonstrating a good command of subject-specific language.
  2. Detailed Descriptions: The authors provide detailed explanations of their methodology and results, which helps in understanding the technical aspects of their research.

Author’s Response: Thank you for your compliment.

Reviewer comment:  Areas for Improvement:

 

  1. Grammar and Syntax:

   -   Run-On Sentences: Some sentences are lengthy and contain multiple ideas, making them difficult to read. For example:

     - "The obtained dataset underwent cleaning procedures employing various techniques, including the removal of stop words, commas, and semi-colons."

     - Suggested revision: "The dataset was cleaned using various techniques, such as removing stop words, commas, and semi-colons."

   -   Sentence Structure: Varying sentence structure can enhance readability. For example:

     - "NER minimizes both time and energy consumption levels in the identification process."

     - Suggested revision: "NER reduces both the time and energy required for the identification process."Author’s Response: Thank you for your comments and suggestions. We have ensured that all the corrections has been now corrected as per your request.


Reviewer comment: Punctuation:

   -   Comma Usage: Some sentences lack necessary commas, which can lead to confusion. For example:

     - "In this research we present an enhanced NER system for the Urdu script by leveraging multilingual BERT and introducing a novel data augmentation technique known as Contextual Word Embedding's Augmentation (CWEA)."

     - Suggested revision: "In this research, we present an enhanced NER system for the Urdu script by leveraging multilingual BERT and introducing a novel data augmentation technique known as Contextual Word Embedding's Augmentation (CWEA)."

   -   Apostrophes  : Correct the misuse of apostrophes. For example:

     - "Contextual Word Embedding's Augmentation (CWEA)"

     - Correct usage: "Contextual Word Embeddings Augmentation (CWEA)"

Author’s Response: Thank you very much for identifying these scientific errors. We have carefully reviewed and updated the corrections as you suggested.

Reviewer comment: " Word Choice:

   -   Technical Precision: Ensure that technical terms are used precisely. For example:

     - "We were used augmentation method to increase the amount of text."

     - Suggested revision: "We used a data augmentation method to increase the amount of text."

   -   Avoiding Redundancy: Avoid repeating the same word or phrase unnecessarily. For example:

     - "The study explores the challenges of NER in Urdu and proposes data augmentation techniques combined with BERT models to improve NER performance."

     - Suggested revision: "The study explores the challenges of NER in Urdu and proposes data augmentation techniques combined with BERT models to enhance performance."

Author’s Response: Dear reviewer, we have corrected and marked with track changes in the revised manuscript.

Reviewer comment:  Clarity and Conciseness:

   -   Simplifying Complex Sentences: Break down complex sentences to make them easier to understand. For example:

     - "The results indicate a notable enhancement in the NER system's overall performance when utilizing the extended dataset."

     - Suggested revision: "The results show a significant improvement in the NER system's performance with the extended dataset."

   -   Avoiding Ambiguity: Ensure that each sentence conveys a clear and specific idea. For example:

     - "NER provides various functions for the Urdu language processing system."

     - Suggested revision: "NER enhances the efficiency of the Urdu language processing system by accurately identifying and categorizing named entities.".

Repetition:

Author’s Response:  Dear reviewer, we have addressed all the comments.   

Reviewer comment:  .   Proofreading:

   -   Consistency: Ensure consistent use of terminology and abbreviations throughout the paper.

   -   Typographical Errors: A thorough proofreading can help catch typographical errors and improve the overall quality of the manuscript.

Author’s Response:  Dear reviewer, we have corrected all the sugestions related terminology, abbreviations and Typographical Errors in the revised manuscript.

Reviewer comment:     Conclusion

The paper demonstrates a solid understanding of the subject matter and uses appropriate technical terminology. However, improving sentence structure, grammar, punctuation, and overall clarity would enhance readability and make the paper more professional. A thorough proofreading and revision process is recommended to address these issues.

Author’s Response:  Worthy reviewer, we have corrected as per your comments.

Note: We welcome any further criticism, suggestions, or comments. Thank you all for your valuable input.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The idea of ​​research is interesting. But the article needs revision.

1. The article is devoted to a review of similar works (introduction + Related Work occupies about 6 pages and 6 pages the main research material - methods + results + discussion). I recommend shortening the first 6 pages by half, removing well-known material. It is also better to pay more attention to the second part of the article, for example, authors must better describe Figure 1 and conduct more experiments, to give their description.

2. Almost all figures are illegible (small font), especially figures 2-3.

3. Formulas 1-3 are well known. It is not necessary to provide them in the article, it is enough to write that you used them for accuracy calculations.

Comments on the Quality of English Language

Minor editing of English language required

Author Response

Dear Editor,

 

Thank You for providing valuable comments to revise our article. We are thankful to the reviewers for their time and effort, as the revised version of the manuscript has been substantially improved with their suggestions. Changes have been highlighted in Track Changes in the revised version. Our point-to-point response to the reviewers’ comments and suggestions is given below.  (Manuscript ID: computers-3161738)

 

 

Manuscript title: Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu

 

                            (Changes in the manuscript are marked with track changes.)

As a response to the comments on the manuscript, “Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu.” the comments and suggestions made by the reviewers were exceptionally valuable, as they were not only relevant but also played a substantial role in enhancing the quality of our manuscript. Please find the answers to the comments.

 

Reviewer 2

Reviewer comment:

The article is devoted to a review of similar works (introduction + Related Work occupies about 6 pages and 6 pages the main research material - methods + results + discussion). I recommend shortening the first 6 pages by half, removing well-known material. It is also better to pay more attention to the second part of the article, for example, authors must better describe Figure 1 and conduct more experiments, to give their description.

Author’s Response:  we have removed unnecessary sentences to reduce the text in the introduction and literature work sections. Regarding Figure 1, we already mentioned the description in detail in the revised manuscript.

Reviewer comment: 2. Almost all figures are illegible (small font), especially figures 2-3.

  1. Formulas 1-3 are well known. It is not necessary to provide them in the article, it is enough to write that you used them for accuracy calculations.."

Author’s Response:   Thank you for pointing this. As per your request, we have updated the figures with good quality and clear labeling by increasing the font size. Further, we have removed the well-known formulas and added the references for readers.

Reviewer comment: Comments on the Quality of English Language

Minor editing of English language required.

Author’s Response:  Worthy Reviewer, we have corrected all the English and grammatical errors throughout the manuscript.

In light of your valuable suggestions, we have revised the manuscript to simplify complex sentences and remove redundant information for better clarity. These changes are marked with track changes in the manuscript.

We have ensured and used consistent terminology and units throughout the manuscript.

We have made extensive efforts to simplify and structure the paragraphs and sentences for clarity throughout the manuscript. These revisions are marked with track changes.

 

Note: We welcome any further criticism, suggestions, or comments. Thank you all for your valuable input.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript presents a robust approach to enhancing Named Entity Recognition (NER) for the Urdu language, a low-resource language, by leveraging data augmentation techniques and transformer-based models. The paper is well-structured, and the methodology appears sound, with comprehensive experiments and a thorough analysis of results.

 

However, several areas could be improved to enhance the clarity, readability, and impact of the paper. Below, I provide detailed comments and suggestions for the authors.

 

     Major Comments:

 

1.  Translation of Figures and Tables: 

   -  Comment:  It is crucial that all figures, tables, and any textual content within them be translated into English. Currently, some examples and annotations in figures appear in Urdu or another language, which may not be accessible to all readers, particularly those who do not understand the language.

   -  Recommendation:  Please ensure that all non-English text is translated into English or accompanied by English explanations. This will make the paper more accessible to the broader academic community.

 

2.  Clarity and Conciseness in the Introduction: 

   -  Comment:  The introduction section, while informative, could be more concise. There are areas where the narrative could be streamlined to focus more directly on the key contributions of the paper.

   -  Recommendation:  Consider revising the introduction to highlight the primary challenges and contributions more succinctly. A more focused introduction will engage readers more effectively and set a clearer context for your research.

 

3.  Integration of Related Work: 

   -  Comment:  The related work section is comprehensive, but the integration of this work into the broader context of your research could be improved. Currently, it feels somewhat disconnected from the narrative of your methodology and contributions.

   -  Recommendation:  Better integration of the related work into the discussion of your methodology would clarify how your work builds on and differs from existing research. This could involve referencing specific methods from the related work in your methodological approach or directly contrasting your results with those of previous studies.

 

4.  Segmentation of Content for Readability: 

   -  Comment:  The paper could benefit from clearer segmentation between different sections. For instance, the challenges faced in Urdu NER and the proposed solutions could be more distinctly separated to improve readability.

   -  Recommendation:  Consider restructuring some sections to clearly demarcate the problem statement, challenges, and your proposed solutions. This would guide the reader through your research more intuitively.

 

5.  Error Analysis Section: 

   -  Comment:  The error analysis provided is valuable, but it could be expanded to offer more insight into why certain errors occur and how they might be mitigated in future work.

   -  Recommendation:  Delve deeper into specific types of errors, such as those related to tokenization or misclassification between categories. Discuss potential strategies for addressing these issues in future research, which would strengthen this section.

 

6.  Broader Implications and Future Work: 

   -  Comment:  While the discussion of results is thorough, it would benefit from a broader reflection on the implications of your findings for NER in other low-resource languages. Additionally, the future work section could be expanded to outline more concrete next steps.

   -  Recommendation:  Expand the discussion to consider how your approach could be generalized or adapted for other low-resource languages. In the future work section, consider proposing specific experiments or methodologies that could build on your current research.

 

     Minor Comments:

 

1.  Grammar and Style: 

   - There are minor grammatical issues throughout the paper that could be addressed to improve readability. For example, some sentences are overly complex or awkwardly phrased.

   -  Recommendation:  A thorough proofreading or a professional editing service could be beneficial to ensure clarity and precision in the language.

 

2.  Additional Visualizations: 

   -  Comment:  The inclusion of more visualizations, such as detailed confusion matrices or performance breakdowns by entity type, would provide a more granular view of the model's strengths and weaknesses.

   -  Recommendation:  Consider adding these visualizations to give readers a clearer understanding of where your model performs well and where it struggles.

 

3.  Hyperparameter and Training Details: 

   -  Comment:  While you have included some details about the hyperparameters and training setup, more information could provide a clearer picture of the experimental setup.

   -  Recommendation:  Consider including a detailed table that lists all hyperparameters, training times, and computational resources used. This will make it easier for other researchers to replicate your work.

Comments on the Quality of English Language

The quality of English in the manuscript is generally good, but there are areas that could benefit from further refinement. Some sentences are overly complex or awkwardly phrased, which can affect readability. Additionally, there are minor grammatical issues throughout the text. A thorough proofreading or the use of a professional editing service would help improve clarity and ensure the language is precise and easy to understand.

 

Here are detailed suggestions for improvement:

 

    1.   Sentence Structure and Complexity:  

   -   Suggestion:   Simplify complex sentences to enhance readability. Some sentences are long and contain multiple clauses, making them difficult to follow. Consider breaking these sentences into shorter, more direct statements.

   -   Example:   Instead of "The introduction section, while informative, could be more concise, and there are areas where the narrative could be streamlined to focus more directly on the key contributions of the paper," you could write, "The introduction is informative but could be more concise. Streamlining the narrative will help focus on the paper's key contributions."

 

    2.   Grammar and Punctuation:  

   -   Suggestion:   Review the manuscript for minor grammatical errors, such as subject-verb agreement, improper use of articles (a, an, the), and punctuation issues.

   -   Example:   In a sentence like "The dataset was cleaned using various techniques, such as removing stop words, commas, and semi-colons," the comma after "semi-colons" is unnecessary and could be omitted for a cleaner sentence structure.

 

    3.   Verb Tense Consistency:  

   -   Suggestion:   Ensure consistency in verb tenses throughout the manuscript. Switching between past and present tenses can confuse the reader and disrupt the flow of the text.

   -   Example:   If discussing past research, maintain the past tense: "The authors conducted experiments using..." instead of switching to the present tense in the same context.

 

    4.   Word Choice and Precision:  

   -   Suggestion:   Choose words that are precise and accurately convey your intended meaning. Avoid using vague or overly general terms when a more specific term is available.

   -   Example:   Instead of "The model's performance was good," specify what aspect of performance you are referring to, such as "The model's accuracy was high," or "The model's precision in identifying entities was satisfactory."

 

    5.   Passive vs. Active Voice:  

   -   Suggestion:   Use active voice where possible to make the text more engaging and direct. Passive voice can sometimes make sentences longer and more difficult to read.

   -   Example:   Instead of "The dataset was augmented by the authors," consider "The authors augmented the dataset."

 

    6.   Use of Technical Terminology:  

   -   Suggestion:   While technical terms are necessary, ensure they are used correctly and consistently. If specific terms are introduced, define them clearly at first use.

   -   Example:   Ensure that terms like "CWEA augmentation" and "BERT multilingual" are introduced with adequate context so that all readers, even those less familiar with these terms, can follow along.

 

    7.   Redundancy and Repetition:  

   -   Suggestion:   Avoid redundancy and repetition of ideas. Review each section to ensure that points are made clearly without unnecessary repetition.

   -   Example:   If a particular methodology or result is mentioned in multiple sections, consider consolidating these mentions to avoid redundancy.

 

    8.   Clarity in Explanations:  

   -   Suggestion:   Clarify explanations, especially in the methodology and results sections, to ensure that complex ideas are conveyed clearly and understandably.

   -   Example:   When describing the BERT model's architecture, ensure that each component is clearly explained, possibly with examples or visual aids to help the reader understand how these components interact.

 

    9.   Proofreading:  

   -   Suggestion:   A thorough proofreading of the manuscript is essential. Consider using a professional editing service or a native English speaker with experience in academic writing to review the manuscript for language issues.

   -   Example:   Proofreading can help catch errors such as missing articles, misplaced modifiers, and awkward phrasing that can detract from the overall quality of the writing.

 

    10.   Consistency in Terminology and Style:  

   -   Suggestion:   Ensure that terminology and style are consistent throughout the manuscript. For instance, if you choose to use American English spelling, maintain it consistently instead of switching to British English spelling.

   -   Example:   Stick to one format for dates, references, and technical terms to avoid confusion.

 

Author Response

Round 2

Reviewer 1

 

Dear Editor,

 

Thank You for providing valuable comments to revise our article. We are thankful to the reviewers for their time and effort, as the revised version of the manuscript has been substantially improved with their suggestions. Changes have been highlighted in Track Changes in the revised version. Our point-to-point response to the reviewers’ comments and suggestions is given below.  (Manuscript ID: computers-3161738)

 

 

Manuscript title: Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu

 

                            (Changes in the manuscript are marked with track changes.)

As a response to the comments on the manuscript, “Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu.” the comments and suggestions made by the reviewers were exceptionally valuable, as they were not only relevant but also played a substantial role in enhancing the quality of our manuscript. Please find the answers to the comments.

 

Reviewer Comments:

Reviewer:1
1.  Translation of Figures and Tables:

Comment:  It is crucial that all figures, tables, and any textual content within them be translated into English. Currently, some examples and annotations in figures appear in Urdu or another language, which may not be accessible to all readers, particularly those who do not understand the language.

Recommendation:  Please ensure that all non-English text is translated into English or accompanied by English explanations. This will make the paper more accessible to the broader academic community.

Authors Response: Thank you for your kind comments. Following your suggestion, we have translated all the Urdu sentences in the figures into English to ensure greater accessibility for the broader academic community. We believe this will enhance the inclusivity and reach of our work.

 

  1. Clarity and Conciseness in the Introduction:

 

Comment:  The introduction section, while informative, could be more concise. There are areas where the narrative could be streamlined to focus more directly on the key contributions of the paper.

Recommendation:  Consider revising the introduction to highlight the primary challenges and contributions more succinctly. A more focused introduction will engage readers more effectively and set a clearer context for your research.

Author Response: We have thoroughly revised the introduction section to make it more concise, while clearly outlining the key contributions of the paper. Additionally, we have highlighted the primary challenges faced in our research to provide a clearer context for readers. These revisions aim to improve the clarity and focus of the introduction, ensuring that the significance of our work and its broader implications are more effectively communicated.

  1. Integration of Related Work:

Comment:  The related work section is comprehensive, but the integration of this work into the broader context of your research could be improved. Currently, it feels somewhat disconnected from the narrative of your methodology and contributions.

Recommendation:  Better integration of the related work into the discussion of your methodology would clarify how your work builds on and differs from existing research. This could involve referencing specific methods from the related work in your methodological approach or directly contrasting your results with those of previous studies.

Author’s Response: Dear editor, as per your request, we have updated the related work section to improve the connectivity between previous studies and the methodologies and contributions presented in our paper. This enhancement ensures a smoother flow, demonstrating how our work builds on and advances existing research in the field.

  1. Segmentation of Content for Readability:

Comment:  The paper could benefit from clearer segmentation between different sections. For instance, the challenges faced in Urdu NER and the proposed solutions could be more distinctly separated to improve readability.

Recommendation:  Consider restructuring some sections to clearly demarcate the problem statement, challenges, and your proposed solutions. This would guide the reader through your research more intuitively.

Author’s Response: Dear editor, As per your request, we have conducted a thorough review of all sections and subsections of the entire manuscript to ensure better readability. This involved refining the language for clarity, improving the structure, and enhancing the flow of ideas.  These revisions aim to enhance the overall quality and coherence of the manuscript, making it easier to read and understand.

 

 

  1. Error Analysis Section:

 

Comment:  The error analysis provided is valuable, but it could be expanded to offer more insight into why certain errors occur and how they might be mitigated in future work.

Recommendation:  Delve deeper into specific types of errors, such as those related to tokenization or misclassification between categories. Discuss potential strategies for addressing these issues in future research, which would strengthen this section.

Author’s Response: Thank you for your insightful feedback. We have addressed your comments by highlighting the factors that contribute to errors in the Urdu NER datasets. Additionally, we have provided a detailed explanation of potential strategies to reduce these errors, such as improving tokenization, handling language-specific challenges, and enhancing model architecture. Furthermore, we have outlined potential future research directions to address these issues and improve the overall accuracy of NER systems for Urdu and other low-resource languages.

  1. Broader Implications and Future Work:

Comment:  While the discussion of results is thorough, it would benefit from a broader reflection on the implications of your findings for NER in other low-resource languages. Additionally, the future work section could be expanded to outline more concrete next steps.

Recommendation:  Expand the discussion to consider how your approach could be generalized or adapted for other low-resource languages. In the future work section, consider proposing specific experiments or methodologies that could build on your current research.

Author’s Response: Dear Editor, Thank you for your valuable feedback. Upon reviewing your suggestion, we have expanded the discussion to include a broader reflection on the implications of our findings for Named Entity Recognition (NER) in other low-resource languages. We have now elaborated on how our results could inform the development of NER systems for Low resource languages, highlighting potential challenges and strategies that may be applied. This extended discussion underscores the relevance of our findings beyond the immediate language studied and their potential contribution to advancing NER research in low-resource contexts. we have also proposed specific experiments and methodologies that could be employed in the education domain. Specifically, we have suggested the design and implementation of advanced deep learning models and large language models tailored for educational applications. These models would aim to enhance the understanding and processing of language in educational contexts, contributing to improved learning outcomes and content personalization. This further strengthens the applicability of our research to other domains, particularly education, where NER systems and language models can have a significant impact.

Minor Comments:

  1. Grammar and Style:

There are minor grammatical issues throughout the paper that could be addressed to improve readability. For example, some sentences are overly complex or awkwardly phrased.

Recommendation:  A thorough proofreading or a professional editing service could be beneficial to ensure clarity and precision in the language.

Authors Response: Thank you for your valuable feedback. We have carefully reviewed the manuscript to address the minor grammatical issues and improve overall readability. Specifically, we have simplified overly complex sentences and rephrased awkwardly structured ones to ensure clearer and more concise communication. For Proofreading, our coauthor Prof. Dr. Edgardo Manuel Felipe Riverá½¹n thoroughly checked the whole manuscript and improved the English.

  1. Additional Visualizations:

   -  Comment:  The inclusion of more visualizations, such as detailed confusion matrices or performance breakdowns by entity type, would provide a more granular view of the model's strengths and weaknesses.

   -  Recommendation:  Consider adding these visualizations to give readers a clearer understanding of where your model performs well and where it struggles.

Author Response: Thank you for your insightful suggestion. We appreciate your recommendation to include additional visualizations. In response, we have incorporated more detailed visualizations, such as confusion matrices and performance breakdowns by entity type, to provide a more granular view of the model's strengths and weaknesses. These visualizations offer a clearer understanding of where the model excels and where it faces challenges, further enriching the analysis and helping readers grasp the nuances of our results.

 

  1.  Hyperparameter and Training Details: 

 Comment:  While you have included some details about the hyperparameters and training setup, more information could provide a clearer picture of the experimental setup.

Recommendation:  Consider including a detailed table that lists all hyperparameters, training times, and computational resources used. This will make it easier for other researchers to replicate your work.

Author Response:  Thank you for your valuable recommendation. In response, we have included detailed information regarding the hyperparameters, training duration, model parameters, training procedures, and the computational resources used. These additions aim to enhance the transparency and reproducibility of our experimental setup, providing a clearer understanding of the methodologies employed.

 

  1.   Sentence Structure and Complexity:  

 Suggestion:   Simplify complex sentences to enhance readability. Some sentences are long and contain multiple clauses, making them difficult to follow. Consider breaking these sentences into shorter, more direct statements.

 Example:   Instead of "The introduction section, while informative, could be more concise, and there are areas where the narrative could be streamlined to focus more directly on the key contributions of the paper," you could write, "The introduction is informative but could be more concise. Streamlining the narrative will help focus on the paper's key contributions."

Author Response: Thank you for your suggestion. We have simplified complex sentences throughout the manuscript by breaking them into shorter, more direct statements to enhance readability and clarity.

  1.   Grammar and Punctuation:  

Suggestion:   Review the manuscript for minor grammatical errors, such as subject-verb agreement, improper use of articles (a, an, the), and punctuation issues.

 Example:   In a sentence like "The dataset was cleaned using various techniques, such as removing stop words, commas, and semi-colons," the comma after "semi-colons" is unnecessary and could be omitted for a cleaner sentence structure.

Author response: Thank you for your valuable feedback. We have carefully reviewed the manuscript to correct minor grammatical errors, including subject-verb agreement, article usage, and punctuation issues, ensuring a more polished and accurate presentation.

  1.   Verb Tense Consistency:  

Suggestion:   Ensure consistency in verb tenses throughout the manuscript. Switching between past and present tenses can confuse the reader and disrupt the flow of the text.

Example:   If discussing past research, maintain the past tense: "The authors conducted experiments using..." instead of switching to the present tense in the same context.

Author Response: Thank you for your suggestion. We have revised the manuscript to ensure consistency in verb tenses throughout, particularly in sections discussing past research, to maintain a clear and coherent flow.

  1.   Word Choice and Precision:  

  Suggestion:   Choose words that are precise and accurately convey your intended meaning. Avoid using vague or overly general terms when a more specific term is available.

 Example:   Instead of "The model's performance was good," specify what aspect of performance you are referring to, such as "The model's accuracy was high," or "The model's precision in identifying entities was satisfactory."

Author response: Thank you for your suggestion. We have carefully reviewed the manuscript to improve word choice and ensure precise terminology is used throughout, avoiding vague or overly general terms to enhance clarity and accuracy.

  1.   Passive vs. Active Voice:  

Suggestion:   Use active voice where possible to make the text more engaging and direct. Passive voice can sometimes make sentences longer and more difficult to read.

Example:   Instead of "The dataset was augmented by the authors," consider "The authors augmented the dataset."

Author Response: Thank you for your insightful suggestion. We have revised the manuscript to incorporate more active voice where appropriate, making the text more engaging and direct. This helps reduce sentence length and improves readability.

  1.   Use of Technical Terminology:  

 Suggestion:   While technical terms are necessary, ensure they are used correctly and consistently. If specific terms are introduced, define them clearly at first use.

 Example:   Ensure that terms like "CWEA augmentation" and "BERT multilingual" are introduced with adequate context so that all readers, even those less familiar with these terms, can follow along.

Author Response:  Thank you for your suggestion. We have carefully reviewed the manuscript to ensure that all technical terms are used correctly and consistently. Additionally, we have provided clear definitions for any specific terms upon their first introduction to ensure clarity for all readers.

  1.   Redundancy and Repetition:  

 Suggestion:   Avoid redundancy and repetition of ideas. Review each section to ensure that points are made clearly without unnecessary repetition.

Example:   If a particular methodology or result is mentioned in multiple sections, consider consolidating these mentions to avoid redundancy.

Author Response: We have carefully reviewed the manuscript to eliminate any redundancy and repetition of ideas, ensuring that each point is made clearly and concisely without unnecessary repetition.

  1.   Clarity in Explanations:  

 Suggestion:   Clarify explanations, especially in the methodology and results sections, to ensure that complex ideas are conveyed clearly and understandably.

 Example:   When describing the BERT model's architecture, ensure that each component is clearly explained, possibly with examples or visual aids to help the reader understand how these components interact.

Author Response: We have revised the manuscript to enhance clarity in the explanations, particularly in the methodology and results sections, ensuring that complex ideas are presented in a clear and easily understandable manner. We also cited the reference for the detail of all the mentioned models in the methodology section.

  1.   Proofreading:  

 Suggestion:   A thorough proofreading of the manuscript is essential. Consider using a professional editing service or a native English speaker with experience in academic writing to review the manuscript for language issues.

 Example:   Proofreading can help catch errors such as missing articles, misplaced modifiers, and awkward phrasing that can detract from the overall quality of the writing.

Author Response: Thank you for your suggestion. The manuscript has undergone thorough proofreading by Professor Dr. Edgardo Manuel Felipe Riverá½¹n to ensure language accuracy and clarity throughout.

  1.   Consistency in Terminology and Style:  

Suggestion:   Ensure that terminology and style are consistent throughout the manuscript. For instance, if you choose to use American English spelling, maintain it consistently instead of switching to British English spelling.

Example:   Stick to one format for dates, references, and technical terms to avoid confusion.

Author Response: We have carefully reviewed the manuscript and used American English for both terminology and style, throughout the document.

 

 

Note: We welcome any further criticism, suggestions, or comments. Thank you all for your valuable input.

 

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

Accept in present form

Author Response

Round 2

Reviewer 2

 

Dear Editor,

 

Thank You for providing valuable comments to revise our article. We are thankful to the reviewers for their time and effort, as the revised version of the manuscript has been substantially improved with their suggestions. Changes have been highlighted in Track Changes in the revised version. Our point-to-point response to the reviewers’ comments and suggestions is given below.  (Manuscript ID: computers-3161738)

 

 

Manuscript title: Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu

Reviewer Comments:

Reviewer:2

Comments and Suggestions for Authors

Accept in present form

Author Response: Thank you for taking the time to review our article. We greatly appreciate your valuable feedback and are grateful for the acceptance of our work.

Author Response File: Author Response.docx

Round 3

Reviewer 1 Report

Comments and Suggestions for Authors

 

   - Suggestions: The introduction, although informative, could benefit from being more concise. Streamlining the narrative to focus more directly on the key contributions and challenges would make the paper more engaging. The integration of related work could also be enhanced. Some of the cited studies are not fully integrated into the flow of the research, leading to slight disconnection between past work and your proposed methodology. Consider briefly mentioning how your approach builds on or diverges from existing research earlier in the introduction.

 

   - Suggestions: It is essential that all figures and tables, particularly those containing text in Urdu, are accompanied by English translations or explanations. Although the manuscript contains translations for some examples, a few figures still contain Urdu text, which may not be accessible to a global audience. Please ensure all visual content is fully translated to make the paper more inclusive.

 

   - Suggestions: While the methodology section is comprehensive, a clearer breakdown of how the data augmentation was applied could provide additional clarity. Additionally, more details on the hyperparameters, model architecture, and computational setup, especially in Table 3, would make it easier for other researchers to replicate the study. A suggestion would be to provide more technical specifics on how the model parameters (e.g., learning rate, optimizer choices) affected performance during experimentation.

 

   - Suggestions: While the results show a clear improvement in performance with the augmented dataset, the analysis could benefit from a more detailed breakdown by entity type. The confusion matrix is a good addition, but more insights into why specific entity types (e.g., PERSON or ORGANIZATION) perform better or worse could deepen the reader's understanding. Expanding the error analysis to discuss potential reasons for misclassifications would strengthen the discussion.

 

   - Suggestions: As noted earlier, the integration of the related work into your study could be improved. Some of the studies are mentioned but not explicitly connected to how they influence your approach or findings. Linking specific methods from the literature to your methodology or results would clarify how your work builds on and differs from existing research.

 

   - Suggestions: Delve deeper into the factors contributing to the errors identified, such as misclassifications due to tokenization issues or ambiguity in certain named entities. Discussing strategies for mitigating these issues, such as experimenting with advanced tokenization techniques or expanding the diversity of training data, would provide valuable insight for future work.

 

   - Suggestions: Expanding on how the CWEA method and the multilingual BERT model could be generalized to other languages with similar challenges would strengthen the impact of the research. Highlighting potential applications beyond the Urdu language, such as in languages with different scripts or those that share certain structural features with Urdu, would broaden the paper's relevance.

 

   - Suggestions: The future work section could benefit from more specific proposals for next steps. For instance, exploring new domains such as education (as mentioned) or expanding the model to handle code-switching in multilingual contexts could be valuable. Additionally, consider proposing experiments with larger datasets or other language models to further explore the capabilities of your approach.

 

 

Comments on the Quality of English Language

 

 1. Clarity and Conciseness:

   - Issue: Certain sections, such as the introduction and related work, tend to be wordy and could benefit from a more concise structure.

   - Recommendation: 

     - In the introduction, focus on presenting the core problem, motivation, and contributions more directly. Avoid redundant or overly general statements.

     - For example, rather than reiterating historical facts about NER multiple times, focus on how these historical developments specifically relate to Urdu NER.

     - Revise long sentences into shorter, more direct ones for better readability.

 

 2. Grammar and Syntax:

   - Issue: While the overall grammar is satisfactory, there are occasional errors in word order, missing articles, and awkward phrasing. This slightly affects the readability and flow of the manuscript.

   - Recommendation:

     - Revise sentences with complex or convoluted structures. For instance:

       - Original: “The model we selected was initially trained in 104 languages, including Urdu, using self-supervised methods.”

       - Revised: “We selected a model pre-trained in 104 languages, including Urdu, using self-supervised methods.”

     - Ensure that articles (e.g., “the,” “a”) are consistently applied where needed.

     - Example: “It is crucial that all figures, tables, and textual content within them be translated…” should be revised to: “It is crucial that all figures, tables, and the textual content within them are translated…”

 

 3. Use of Non-English Text:

   - Issue: The manuscript includes some Urdu text, especially in figures, tables, and examples, which may not be accessible to all readers. This non-English content can make the manuscript difficult to understand for a broader audience.

   - Recommendation:

     - Ensure all instances of non-English text, especially in figures or examples, are accompanied by English translations. This applies particularly to the Urdu examples in Figure 2 and Table 5.

     - Example: In the figure where Urdu names or sentences are shown, include English translations directly beneath or within the figure captions.

     - When presenting tokenized examples (like "Mardan University"), provide an English equivalent alongside for better understanding by readers unfamiliar with Urdu script.

  

 4. Figures and Images:

   - Issue: Some figures contain Urdu text or annotations that are not fully translated into English. Additionally, there is a lack of clarity in some images, and the context of their inclusion could be better explained.

   - Recommendation:

     - Translate all textual content within figures and tables into English. For example, in Figure 2, the Urdu sentence should be translated fully into English within the figure or figure caption.

     - Ensure that images and figures are labeled clearly and explain how they contribute to the narrative in the main text. Each figure should have a clear description in the caption, and the figure should be referenced in the text.

     - Example: When you show a confusion matrix, explain what the key misclassifications mean in relation to the research in more detail.

 

 5. Terminology Consistency:

   - Issue: Some terminology is used inconsistently throughout the manuscript. For instance, named entity categories such as PERSON, LOCATION, ORGANIZATION, etc., are sometimes not consistently referenced.

   - Recommendation:

     - Ensure consistency in how key terms and entity labels are formatted and used throughout the text. For example, always use uppercase for the entity categories (e.g., PERSON, LOCATION) if that is the chosen convention, or keep them consistent throughout.

     - Maintain uniformity in how the various models (e.g., BERT, RNN) are referred to and discussed.

 

 6. Typographical Errors:

   - Issue: Minor typos, such as missing prepositions, misplaced commas, or inconsistent punctuation, occasionally occur throughout the text.

   - Recommendation:

     - Conduct a thorough proofreading to catch small errors such as misplaced commas or missing periods. For example, double-check sentences for proper punctuation after citations or at the end of clauses.

     - Example: "Additionally, the elimination of white spaces was carried out..." should be revised to, "Additionally, white spaces were removed..."

 

 7. Technical Jargon and Explanation:

   - Issue: Some of the technical jargon used in the paper may not be immediately clear to readers who are not deeply familiar with NLP, especially regarding model types and tokenization processes.

   - Recommendation:

     - Where appropriate, briefly explain technical terms, particularly those related to BERT and NER models, for readers who may not have specialized knowledge in these areas.

     - For example, a short explanation of what "WordPiece embedding" means and how it is relevant to NER would improve the accessibility of the paper to a broader academic audience.

 

 8. Flow and Organization:

   - Issue: The organization of some sections, particularly the transition between related work, methodology, and results, could be smoother.

   - Recommendation:

     - Use clearer transition phrases to guide the reader through different sections. For example, at the end of the related work section, explain how the previous research informs the choice of models and methods in your study.

     - Breaking down sections into more distinct subsections with descriptive headings will improve the flow.

Author Response

Round 3

Reviewer 1

 

Dear Editor,

 

Thank You for providing valuable comments to revise our article. We are thankful to the reviewers for their time and effort, as the revised version of the manuscript has been substantially improved with their suggestions. Changes have been highlighted in Track Changes in the revised version. Our point-to-point response to the reviewers’ comments and suggestions is given below.  (Manuscript ID: computers-3161738)

 

 

Manuscript title: Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu

 

                            (Changes in the manuscript are marked with track changes.)

As a response to the comments on the manuscript, “Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu.” the comments and suggestions made by the reviewers were exceptionally valuable, as they were not only relevant but also played a substantial role in enhancing the quality of our manuscript. Please find the answers to the comments.

Suggestions: The introduction, although informative, could benefit from being more concise. Streamlining the narrative to focus more directly on the key contributions and challenges would make the paper more engaging. The integration of related work could also be enhanced. Some of the cited studies are not fully integrated into the flow of the research, leading to a slight disconnection between past work and your proposed methodology. Consider briefly mentioning how your approach builds on or diverges from existing research earlier in the introduction.

Author Response: Dear Reviewer, Thank you for your valuable feedback. We appreciate your suggestion to make the introduction more concise and focused. In response, we already streamlined the narrative to emphasize the key contributions and challenges of our research, ensuring that the main points are highlighted more clearly and engagingly. Additionally, we have revised the integration of related work within the introduction. We have restructured the discussion of cited studies to ensure they are more seamlessly connected to our proposed methodology. To enhance the flow, we have also included brief mentions of how our approach builds on or diverges from existing research earlier in the introduction, providing clearer context for our contributions. We believe these revisions address your concerns and improve the overall coherence and impact of the introduction.

Suggestions: It is essential that all figures and tables, particularly those containing text in Urdu, are accompanied by English translations or explanations. Although the manuscript contains translations for some examples, a few figures still contain Urdu text, which may not be accessible to a global audience. Please ensure all visual content is fully translated to make the paper more inclusive.

Author Response: Worthy Reviewer, Thank you for your important observation. We fully agree that it is essential to make all figures and tables accessible to a global audience. In response to your suggestion, we have reviewed the manuscript and ensured that English translations or explanations now accompany all figures and tables. However, certain Urdu words remain untranslated, as translating them could lead to confusion for the reader.

 

Suggestions: While the methodology section is comprehensive, a clearer breakdown of how the data augmentation was applied could provide additional clarity. Additionally, more details on the hyperparameters, model architecture, and computational setup, especially in Table 3, would make it easier for other researchers to replicate the study. A suggestion would be to provide more technical specifics on how the model parameters (e.g., learning rate, optimizer choices) affected performance during experimentation.

Author Response: Dear Reviewer, thank you for your insightful suggestion. we have added a table No. (   ) in the supplementary file that represents the detailed approach of each step that were using during Data augmentation.  Regarding including more technical specifics model parameters, such as learning rate and optimizer choices, affected performance during experimentation. We recognize the importance of these details and a deeper understanding of the model’s behavior. Moreover, we already mentioned in the methodology section to include a more detailed discussion on the impact of these parameters. Additionally, we have provided an analysis of different optimizer choices and their effect on the overall performance of the model. These details have been incorporated into Table 3 and the accompanying text. We believe these details will enhance the clarity of our work and provide valuable insights for other researchers looking to replicate or extend our study.

 

 Suggestions: While the results show a clear improvement in performance with the augmented dataset, the analysis could benefit from a more detailed breakdown by entity type. The confusion matrix is a good addition, but more insights into why specific entity types (e.g., PERSON or ORGANIZATION) perform better or worse could deepen the reader's understanding. Expanding the error analysis to discuss potential reasons for misclassifications would strengthen the discussion.

Author Response: Dear Reviewer, Thank you for your valuable suggestion. We appreciate your observation regarding the need for a more detailed breakdown of performance by entity type. In our analysis, the improved performance of specific entities, such as PERSON and ORGANIZATION, can be attributed to the more balanced representation of these entities in the augmented dataset. This balance likely contributed to the model's ability to generalize better for these entities compared to others. Moreover, we already expended the error analysis to discuss the potential reason for misclassification in the error analysis section.

Suggestions: As noted earlier, the integration of the related work into your study could be improved. Some of the studies are mentioned but not explicitly connected to how they influence your approach or findings. Linking specific methods from the literature to your methodology or results would clarify how your work builds on and differs from existing research.

Author Response: Dear Reviewer, thank you for your insightful feedback. we have revised the manuscript to explicitly connect the cited studies to our methodology and findings. We have made a concerted effort to link specific methods from the literature to our approach, highlighting how our work builds on and differs from existing research. These changes should provide clearer context and demonstrate the progression from previous studies to our current work.

 Suggestions: Delve deeper into the factors contributing to the errors identified, such as misclassifications due to tokenization issues or ambiguity in certain named entities. Discussing strategies for mitigating these issues, such as experimenting with advanced tokenization techniques or expanding the diversity of training data, would provide valuable insight for future work.

Author Response: Dear reviewer, Thank you for your comment. We have discussed the advanced strategies (more sophisticated tokenization techniques, such as WordPiece, Byte-Pair Encoding (BPE), or SentencePiece) about how to mitigate misclassifications and tokenization errors. These strategies, collectively, can significantly enhance the model's performance and reduce errors in future applications. Thank You.

Suggestions: Expanding on how the CWEA method and the multilingual BERT model could be generalized to other languages with similar challenges would strengthen the impact of the research. Highlighting potential applications beyond the Urdu language, such as in languages with different scripts or those that share certain structural features with Urdu, would broaden the paper's relevance.

Author Response: Dear Reviewer, Thank you for your valuable suggestion. Regarding The mechanism behind the CWEA augmentation and the multilingual BERT model, they can handle languages with complex structures and non-Latin scripts by leveraging context-aware word embeddings and pre-trained multilingual capabilities. This approach enhances data representation and model performance, making it applicable to languages like Arabic, Persian, and Pashtu that share similar linguistic challenges with Urdu. Furthermore, we already highlighted the potential applications of these models for other languages in the revised manuscript.

Suggestion: The future work section could benefit from more specific proposals for next steps. For instance, exploring new domains such as education (as mentioned) or expanding the model to handle code-switching in multilingual contexts could be valuable. Additionally, consider proposing experiments with larger datasets or other language models to further explore the capabilities of your approach.

Author Response: Dear reviewer, Thank you for your valuable suggestions. We appreciate your input and have incorporated your recommendations into our future work plans. Specifically, we aim to develop and curate a large, comprehensive dataset tailored for the education domain, which will include a broader variety of entity types to support more diverse and nuanced NLP applications within this sector. They will allow us to refine our models and broaden their applicability across different domains and linguistic scenarios. We believe these additions align well with your suggestions and will significantly strengthen the impact and relevance of our future research.

 Comments on the Quality of English Language

  1. Clarity and Conciseness:

   - Issue: Certain sections, such as the introduction and related work, tend to be wordy and could benefit from a more concise structure.

Author Response: Dear reviewer, we have revised the introduction and related work sections to reduce wordiness and improve clarity. The revised sections now have a more streamlined structure, focusing directly on the key points to enhance readability and ensure the content is more engaging.

  1. Grammar and Syntax:

   Issue: While the overall grammar is satisfactory, there are occasional errors in word order, missing articles, and awkward phrasing. This slightly affects the readability and flow of the manuscript.

Author response: Dear Reviewer, we have carefully reviewed the manuscript and made corrections to improve word order, ensure appropriate use of articles, and eliminate awkward phrasing. These revisions have enhanced the readability and flow of the manuscript, making it clearer and more refined.

  1. Use of Non-English Text:

   - Issue: The manuscript includes some Urdu text, especially in figures, tables, and examples, which may not be accessible to all readers. This non-English content can make the manuscript difficult to understand for a broader audience.

Author Response: we have added English translations and explanations alongside all Urdu text in the figures, tables, and examples. This ensures that the content is clear and understandable for all readers, regardless of their familiarity with the Urdu language.

  1. Figures and Images:

   - Issue: Some figures contain Urdu text or annotations that are not fully translated into English. Additionally, there is a lack of clarity in some images, and the context of their inclusion could be better explained.

Author response: Dear Reviewer, we have addressed your concerns by ensuring that all figures containing Urdu text or annotations are now translated into English. Additionally, we have improved the clarity of the images and provided more detailed explanations to better contextualize their inclusion in the manuscript.

  1. Terminology Consistency:

   - Issue: Some terminology is used inconsistently throughout the manuscript. For instance, named entity categories such as PERSON, LOCATION, ORGANIZATION, etc., are sometimes not consistently referenced.

Author Response: Dear reviewer, we have carefully reviewed the manuscript and standardized the references to named entity categories such as PERSON, LOCATION, ORGANIZATION, and others. Consistent terminology has been ensured across the entire text to improve clarity and avoid any confusion.

  1. Typographical Errors:

   - Issue: Minor typos, such as missing prepositions, misplaced commas, or inconsistent punctuation, occasionally occur throughout the text.

 

 

Author Response: Dear Reviewer, we have conducted a thorough review of the text and corrected the minor types, including missing prepositions, misplaced commas, and inconsistent punctuation. These revisions ensure that the manuscript is free from such errors, improving the overall quality and readability.

  1. Technical Jargon and Explanation:

   - Issue: Some of the technical jargon used in the paper may not be immediately clear to readers who are not deeply familiar with NLP, especially regarding model types and tokenization processes.

Author Response: Dear reviewer, we have already provided detailed descriptions of the models, which should offer valuable insights to readers from different fields.

  1. Flow and Organization:

   - Issue: The organization of some sections, particularly the transition between related work, methodology, and results, could be smoother.

Author Response: Dear reviewer, we have revised the transitions between the related work, methodology, and results sections to create a smoother and more coherent flow.

Author Response File: Author Response.docx

Back to TopTop