Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

AB-LaBSE: Uyghur Sentiment Analysis via the Pre-Training Model with BiLSTM

Appl. Sci. 2022, 12(3), 1182; https://doi.org/10.3390/app12031182

by Yijie Pei^1,†

, Siqi Chen^1,†, Zunwang Ke^2,*

, Wushour Silamu² and Qinglang Guo³

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Appl. Sci. 2022, 12(3), 1182; https://doi.org/10.3390/app12031182

Submission received: 23 November 2021 / Revised: 8 January 2022 / Accepted: 20 January 2022 / Published: 24 January 2022

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

The topic is interesting and in a popular domain.

There are some typos (like Hungarian instead of English: this is weird, though). Also, make sure all the abbreviations are explained before being used (like "P" for "Precision" in 4.5).

I've never been a big fan of augmenting data in general and in text analysis in particular.

In NLP the key is the semantic and introducing exogenous elements potentially change it.

Most of the recent papers in NLP focus on a mechanistic approach that is far from what language is about. Augmentation is an other example of not considering the semantical aspects of language.

On the other hand, you said "compared with the method without data enhancement, there was also a great improvement", this is not very evident from Figure 10 and 11, where the two LaBSE (augmented and not) are not "greatly different".

Further evidences should be provided.

Author Response

Dear Reviewer:

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “AB-LaBSE: Uyghur sentiment Analysis via the Pre-Training Model with BiLSTM” (ID: ISSN 2076-3417). Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval. The main corrections in the paper and the responds to the reviewer’s comments are as flowing:

1.Response to comment: some typos.

Response: We are very sorry that due to our negligence, some spelling mistakes and incorrect text expression have been caused. For this, we have made strict inspection and modification in the full text. And we explain in detail the meaning of ‘P’, ‘R’, ’F1’in Section 4.4.

2.Response to comment: the semantic and introducing exogenous elements potentially change it.

Response: We also don't like to use the extension but have to admit its excellent effect on low resource Uyghur language.

It is worth mentioning that the data enhancement method we use is not EDA, which is commonly used in the past. EDA changes the sequence information of the original text whether it is synonym replacement, random replacement, random insertion or random deletion. However, the AEDA method only inserts punctuation marks and does not significantly modify the sequence information of the original data. In our opinion, it would be more negative if words were modified and the original semantics changed; In contrast, the insertion of only a few punctuation marks did not change the word order of the original text, although it increased the noise, and the result was not negative.

Finally, according to the experiment of AEDA paper, if we are compared with traditional RNN and CNN, our data enhancement effect will be very obvious. In this paper, we all use the cross-language pre-training model for training, and the pre-training model itself has a very good effect. In the original paper of AEDA, the Bert model was used and the data enhancement effect was only improved by a few tenths, therefore, it can be concluded that the experiment has been improved on the pre-training model.

We tried our best to improve the manuscript and made some changes in the manuscript. These changes will not influence the content and framework of the paper. We submit the paper as an attachment .

We appreciate for Reviewers’ warm work earnestly, and hope that the correction will meet with approval.

Once again, thank you very much for your comments and suggestions.This is very important for us to improve the quality of our papers.

Author Response File: Author Response.pdf

Reviewer 2 Report

The manuscript is centered on a very interesting and timely topic, which is also quite relevant to the themes of Applied Sciences. Organization of the paper is good and the proposed method is quite novel. The length of the manuscript is about right.

The paper, however, does not link well with recent literature on sentiment analysis appeared in relevant top-tier journals, e.g., the IEEE Intelligent Systems department on "Affective Computing and Sentiment Analysis". Also, new trends on the ensemble application of symbolic and subsymbolic AI for sentiment analysis are missing.

Authors seem to handle sentiment analysis simply as a binary classification problem (positive versus negative). What about the issue of neutrality or ambivalence? Check relevant literature on detecting and filtering neutrality in sentiment analysis and recent works on sentiment sensing with ambivalence handling.

Finally, the manuscript only cites a few papers from 2021: check new perspectives for neural tensor networks and recent works on multiplicative attention mechanism for aspect category detection.

The manuscript presents some bad English constructions, grammar mistakes, and misuse of articles: a professional language editing service (e.g., the ones offered by IEEE, Elsevier, or Springer) is strongly recommended in order to sufficiently improve the paper's presentation quality for meeting the high standards of Applied Sciences.

Finally, double-check both definition and usage of acronyms: every acronym, e.g., NLP, should be defined only once (at the first occurrence) and always used afterwards (except for abstract and section titles). Also, it is not recommendable to generate acronyms for multiword expressions that are shorter than 3 words, e.g., DA (unless they are universally recognized, e.g., AI).

Author Response

Dear Reviewer:

1.Response to comment: IEEE, aspect category detection.

Response: We think you are quite right about this question. Therefore, 13, 14 and 15 references are added in the introduction to further explain the latest trends of sentiment analysis.

2.Response to comment: binary classification problem

Response: We build and use two datasets, one is a binary classification dataset, the non-public dataset, the dataset itself is a dichotomy dataset. For neutral information, our five-category data set contains five emotions: happy, sad, surprised, angry, and neutral.

3.Response to comment: English.

Response: We are very grateful for your advice. According to your requirements, we have corrected all the abbreviations in the full text, such as NLP and DA.

4.Response to comment: definition.

Response: We are very grateful for your advice. According to your requirements, we have corrected all the abbreviations in the full text, such as NLP and DA.

We appreciate for Reviewers’ warm work earnestly, and hope that the correction will meet with approval.

Once again, thank you very much for your comments and suggestions.This is very important for us to improve the quality of our papers.

Author Response File: Author Response.pdf

Reviewer 3 Report

The paper presents a method and a dataset for Uyghur sentiment and emotion analysis. Uyghur is a low resource language and an additional dataset will be welcomed by the NLP community. It is vital for the dataset to be made publically available for the community.

Evaluation results show the superiority of the method over several baselines and SOTA multilingual methods. The paper is well-written but is overly detailed and can be shortened.

Detailed comments:

Fig.1 : is not aligned properly

l.121-122 This is emotion analysis, not SA.

l.127 Uyghur languages, plural - this issue needs to be clarified, which dialects are handled?

l.131 Upper and lower cultures is undefined, what are they??

Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L. and Stoyanov, V., 2019. Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.

l.276 BiLSTM is very standard these days, it does not need a separate section, just its name, and reference.

l.330 This division is highly non-standard, it should be 75-5-20.

Table 1: This does not seem correct -- there is less data after the augmentation. How can it be? The point of augmentation is to increase data size.

Section 4.2 There is no need to dedicate a section to P,R,F1, they are very standard.

Fig.6 This figure is redundant, I'd much rather see runtime information.

Fig.7: It will look much better as a table. The numbers in the figure are very small and hard to analyze.

Fig.8+9: Please replace by tables.

Fig 10+11: An indication of whether score drops are statistically significant or not should be added.

Author Response

Dear Reviewer:

The main corrections in the paper and the responds to the reviewer’s comments are as flowing:

1.Response to comment: Fig.1.

Response: We have corrected this problem in the revised paper by centering the picture.

2.Response to comment: This is emotion analysis, not SA.

Response: We are sorry for our negligence on this issue. We have corrected the emotion analysis of the five categories in the whole paper.

3.Response to comment: Uyghur languages.

Response: We are very sorry for our mistake in this detail, and we have corrected it all in the full text. Our model experiments are based on this dialect, not a plural.

4.Response to comment: Upper and lower cultures is undefined.

Response: We propose a method to add BiLSTM layers, in which our data sets from outputs that have been pre-trained across languages are associated with BiLSTM layers for better learning context features. In this task, the method can better select relevant semantic and feature information from the pre-trained language model. This method can improve the performance of downstream tasks effectively by taking advantage of the characteristics of context association of cohesive languages. We specially made detailed improvements in the introduction of the article.

5.Response to comment: BiLSTM.

Response: We refer to some papers in journals. In order to ensure the integrity and detailed description of the module, we try to retain the content here, which explains in detail that BiLSTM can better learn context features. We explained that in this experiment, we added a dropout layer to prevent model overfitting and linear transformation of the dropout output.

6.Response to comment: This division is highly non-standard.

Response: We refer to some papers in journals, and the ratio of seven to three is also normal.

7.Response to comment: Table 1.

Response: We are very sorry that this is our mistake, and we have made a correction.

8.Response to comment: Section 4.2 .

Response: We think you are right and have deleted this chapter, but one of the Reviewers wants us to explain 'P','R' and 'F', so we have made a brief explanation at the beginning of section 4.4 for reading.

9.Response to comment: Fig.6 .

Response: I'm sorry that we can't delete it. Since we do the classification task, we need to explain the category, which is common in journals.

10.Response to comment: Fig.6 .

Response: Please check Table5, Table6, we have made modification.

11. Response to comment: Fig 10+11.

Response: In ablation experiments, we demonstrate the role of key factors in our approach, training and testing methods with and without data enhancement and methods with and without BilSTM layers on two data sets. The specific data parameters are shown in Figure 8. The detailed reasons are explained in the paper.

Once again, thank you for your rigorous attitude. Your comments on our paper are very valuable and useful. We appreciate for Reviewers’ warm work earnestly, and hope that the correction will meet with approval.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

There is still something missing in the Results/Analysis.

Figure 8 an 9 seem to compare the different approaches, but the interpretations are not much in line with what is in the graphs.

For "The effect of augmentation", Figure 8 (BTW, it should probably be "negative", not "navigate") shows no difference; Figure 9, not much of a difference, not the "great improvement" added to the text.

For "The effect of BiLSTM" it's pretty much the same.

Unless there is something not in the text, data and interpretation do not match.

If there is something missing, the interpretation should be more detailed.

Author Response

Dear Reviewer:

First of all, we checked the spelling of the full text according to the suggestion and corrected the mistakes of the words.
Secondly, starting from line 425, we improved the interpretation of Figure 8 and Figure 9. Figure 8 analyzed the results of the ablation experiment for the two classification datasets, and compared with the baseline model LaBSE, it could be seen that the overall result improved after data augmentation and adding a BiLSTM layer. Figure 9 shows the same analysis for the five classification datasets. The replacement chart makes it much clearer.

We tried our best to improve the manuscript and made some changes in the manuscript. These changes will not influence the content and framework of the paper. We submit the paper as an attachment .

We appreciate for Reviewers’ warm work earnestly, and hope that the correction will meet with approval.
Once again, thank you very much for your comments and suggestions. This is very important for us to improve the quality of our papers.

Author Response File: Author Response.pdf

Reviewer 2 Report

The pdf has not been properly compiled. All references are missing.

Author Response

Dear Reviewer:

First of all, we checked the spelling of the full text according to the suggestion and corrected the mistakes of the words.
Secondly, the references for our paper are on pages 15 and 16, and people can jump to the references by clicking on the numbers in the text. In latex format files, our references are kept by reference.bib.

We tried our best to improve the manuscript and made some changes in the manuscript. These changes will not influence the content and framework of the paper. We submit the paper as an attachment .
We appreciate for Reviewers’ warm work earnestly, and hope that the correction will meet with approval.
Once again, thank you very much for your comments and suggestions. This is very important for us to improve the quality of our papers.

Author Response File: Author Response.pdf

Article Menu

AB-LaBSE: Uyghur Sentiment Analysis via the Pre-Training Model with BiLSTM

Further Information

Guidelines

MDPI Initiatives

Follow MDPI