Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks
Round 1
Reviewer 1 Report
This paper proposes three data augmentation methods such as cognate-based, antonym-based and antipode-based schemes for enhancing robustness in text classification. I have some review comments to improve the overall quality of the paper.
1) The authors need to clearly provide performance results in the abstract.
2) The authors need to clearly describe their proposed data augmentation schemes.
3) The authors need to redraw some unclear result figures.
4) The authors need to provide more information and the reasons of the performance gains related to the data augmentation results to verify their proposed solutions.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
The authors proposed a method for analyzing text classification robustness via data augmentation. The following issues should be addressed before this paper can be accepted for publication: This paper should have a section of related works describing similar papers on the same topic. In addition, the robustness should also be mentioned in other applications. The authors should mention other applications where their methodology could also be useful to evaluate/enhance the robustness of the classification task. For example, applications in topic segmentation or authorship recognition should be mentioned e.g.: doi: 10.1063/1.4954215 , doi: 10.1371/journal.pone.0193703 , arxiv: abs/1802.10135 The three new data augmentation methods utilizing sentimental words and antonyms should be better motivated. Figure 1 is confusing. It is actually text, not figure. Is it possible to include more datasets? Shap values are not informative. None of the features play an important factor. The discussion is too short. The authors should include more discussion. The concept of robustness occurs also in other areas. The authors could explain this concept in the introduction.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 3 Report
Summary
The authors propose 3 new data augmentation methods to enhance the robustness of NLP models. They compare their methods with two existing methods and report better performance.
Strengths
The references are relevant and recent.
The paper is relatively easy to follow and moderately well-written.
The datasets and models for evaluation are publicly available, thereby aiding reproducibility though the authors do not say whether they will release their codes, which is important for reproducibility.
Weaknesses
A related work (or literature review) section is missing in the paper. A large part of what is in the introduction should rather be in a related work section.
Based on line 80, I don’t think the novelty is really new, given that the techniques are based on sentiment words and antonyms.
Antipode 3 and Antonym 2 are said to be the most significant and exceptional in figures 2 and 3 but there are no reasons given for these by the authors.
The authors do not say if they will release their codes to foster reproducibility.
There is no confusion matrix of any of the results to see in detail the strength and weaknesses of any of the models.
The authors claim improved robustness for their methods over two other methods but no explanation or analysis of why this is the case is given.
The Discussion section merely re-states what the authors have said in previous sections. Instead, the authors should discuss new insights, the fallout and the scope of their work. It may also include the limitations, as it is currently done.
Comments
The paper needs proofreading and grammar check. For example, line 16 in the abstract, the statement is incomplete “… we found the accuracy and F1-Score of prediction models trained by our augmented datasets.”
Other areas that need grammar correction, among others, are: lines 69, 70,
Line 28 needs context. Saying “which guarantee high accuracy” is a sweeping statement, which is not necessarily correct but of a number of factors.
Line 32 “These studies” should be singular since this is a single paper.
Instead of describing the datasets in experiments, I think they should be in Materials and Methods.
Line 159, instead of “obey”, “follow” is the appropriate word.
Undue repetition on line 176. It’s already in 168. The last clause on line 190 is also unnecessary.
Figure 3 shows losses in performance and not improvements since they decrease. This should be made clear in the figure. This is mostly the case too in fig 10.
Page 14 is totally blank.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
I'm satisfied with the revision.