Next Article in Journal
Highly Robust Observer Sliding Mode Based Frequency Control for Multi Area Power Systems with Renewable Power Plants
Next Article in Special Issue
Classification of Arabic Tweets: A Review
Previous Article in Journal
Solar Energy Conversion and Storage Using a Photocatalytic Fuel Cell Combined with a Supercapacitor
Previous Article in Special Issue
Connecting Discourse and Domain Models in Discourse Analysis through Ontological Proxies
 
 
Article
Peer-Review Record

Reinforced Transformer with Cross-Lingual Distillation for Cross-Lingual Aspect Sentiment Classification

Electronics 2021, 10(3), 270; https://doi.org/10.3390/electronics10030270
by Hanqian Wu 1,2,*, Zhike Wang 1,2, Feng Qing 1,2 and Shoushan Li 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2021, 10(3), 270; https://doi.org/10.3390/electronics10030270
Submission received: 4 December 2020 / Revised: 10 January 2021 / Accepted: 11 January 2021 / Published: 23 January 2021
(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)

Round 1

Reviewer 1 Report

The manuscript is centered on an interesting topic. Organization of the paper is good and the proposed method is quite novel.

The manuscript, however, does not link well with recent literature on sentiment analysis appeared in relevant top-tier journals, e.g., the IEEE Intelligent Systems department on "Affective Computing and Sentiment Analysis". Also, latest trends in multilingual sentiment analysis are missing, e.g., see Lo et al.’s recent survey on multilingual sentiment analysis (from formal to informal and scarce resource languages). Finally, check recent resources for multilingual sentiment analysis, e.g., BabelSenticNet.

The manuscript presents some bad English constructions, grammar mistakes, and misuse of articles: a professional language editing service is strongly recommended (e.g., the ones offered by IEEE, Elsevier, and Springer) to sufficiently improve the paper's presentation quality for meeting Electronics’ high standards.

Finally, double-check both definition and usage of acronyms: every acronym should be defined only once (at the first occurrence) and always used afterwards (except for the abstract). Also, we do not recommend to generate acronyms for multiword expressions that are shorter than 3 words.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper titled "Reinforced Transformer with Cross-Lingual Distillation for Cross-Lingual Aspect Sentiment Classification" explores the capability of transformer-based models to the task transfer of sentiment classification. The proposed solution addresses some of the issues that arise when aligning two domains (in cross lingual setting), by exploiting the reinforcement learning paradigm. The resulting method is reported to outperform state-of-the-art.

 

Comments:

 

1.) "Cross-Lingual" repreats twice in the title, which is very redundant.

2.) Line 2: focus -> focuses?

3.) Line 26: indeed available -> just available?

4.) Line 27: amount -> number of?

5.) Line 64: the line break is redundant.

6.) At line e.g., 72, the reader does not know what exactly adversarial learning is. Please clarify this additionally in the introduction section.

7.) line 104: Reinforced transformer? Do you mean reinforcement learning-based transformer here?

8.) Figure 1: caption "will be not involved" is incorrect phrasing.

9.) What do you mean by "freeze"? Does that mean you do not tune the parameters?

10.) First conceptual question here: The reinforcement step is responsible for selection of tokens. Intuitively this idea works only if the tokens are selected prior to being used in the padding space, as otherwise you are introducing false positive deletions (a_i = 0 by default).

11.) I'd say the point of soft attention is that it in fact learns to discard some token-token relations. The space of attention heads in e.g., BERT is rather sparse in this regard. Please comment on that and how it relates to your claim about hard attention (l 145)

12.) Equation 3) is not all that clear. It reads as vector a is approximated by RATS(some parameters). Isn't this done for individual inputs? It looks like you are learning to discard positions in general, which makes no sense here.

13.) Equation 4.) and around -> transformer layers? This is not a thing. Transformer is a type of architecture. It seems you mean the attention layers here? It also looks like it based on F3.

14.)  E.q. 5.) Discuss the T in more detail -> why is it set to 1? this makes e.q., 5) an ordinary softmax then?

15.) after e.g., 6.) you claim T is set to 1 for testing. What about training? Is this learned then?

16.) In section 3.3. you discuss the implications of adversarial verification of the input sequences. Could you give an example here?

17.) Figure 4, labels are too small, perhaps split the space in two rows here.

18.) It seems adversarial mode does not help much? I suggest you do an additional ablation study as to why this is the case.

19.) Conclusions section (5) is too short. Please, additionally discuss at least what other data this could work on, what are the main limitations and provide the link to the source code wich replicates the experiments.

20.) How did you select the data sets? It seems rather odd in terms of currently available ones in e.g., https://www.aclweb.org/anthology/I17-1051/

 

Overall, the paper is acceptable after the concerns are solved and the language is additionally corrected.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

The paper makes two contribution: develops a new data set, and describes an experiment with hard attention for machine translation. Both of these are diminished without the authors publishing the data and the code. 

I find the article interesting, and overall well written. It seems to be modestly original. For example, "Look Harder: A Neural Machine Translation Model with Hard Attention" by Indurthi et al. 2019 talks about a similar idea, and it should be referenced and compared. Same for "Attention-Aware Sampling via Deep
Reinforcement Learning for Action Recognition" by Dong et al. 2019. (Neither I thing deals with EN-CH translation). 

I do have some issues with the presentation, minor ones with English, more substantial with the technical exposition. 

I'll start with the language, as this is the easiest to improve: --- You should recheck for missing determiners, agreement, punctuation etc. e.g. lines 1, 9, 116, 145 (makes really confusing reading without a determiner). Otherwise I did not find the writing interfering with my understanding. 

 

On the other hand I'm confused about several technical aspects: 

  •  I had most difficulty following your presentation in lines 166-180, and I strongly suggest you rewrite it for clarity. 
    • You refer to Ganin before introducing Eq.(9). I have no problem following Ganin et al [20], but I don't understand (9). What is the meaning of the loss function coming from an arithmetic sum of one function which is minimized and the other that is maximized? (lines 167,168).
    • In 173, shouldn't you reference Eq. 2 for R? 
    • RAWS module is nowhere defined 
    • REINFORCE algorithm is nowhere defined 
  • 135 "~" sampling does not appear until much later, and perhaps the definition could be postponed until when it's needed
  • 137 Eg.(2) what are N' and N? 
  • Fig.2. Why are there dashed lines around the lstm/state network? Fig. 1. uses dash=training convention, but I don't think it's appropriate in Fig.2. and Fig.3. 
  • In Fig.3 I don't understand the transformation H_A ---average--- H_A
  • Why was the temperature reset to 1 for testing 158? 
  • 236 -- I don't think 'significant' is an appropriate word to be used. The only somewhat 'significant' improvement is on EN-CH. The other ones are good (1% F1). But note this is the test on one data set, only. 
  • Error analysis. It should perhaps be mentioned that the observed errors are typical. E.g. "Deep Unordered Composition Rivals Syntactic Methods
    for Text Classification" by Iyyer et al. discusses the same phenomena, and it should be referenced and compare. 
  • Overall, I like the idea of experimenting with a binary 'policy network', aka a 'hard attention', and improving EN-CH translation. However both the data and the code should be made public before the paper is accepted.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The authors have addressed the raised concerns. I would urge the authors to re-check the manuscript for typographical errors though.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop