Next Article in Journal
Seismic Velocity Characterisation of Geothermal Reservoir Rocks for CO2 Storage Performance Assessment
Previous Article in Journal
Core Microbiota Promotes the Development of Dental Caries
 
 
Article
Peer-Review Record

Attention-Enhanced Graph Convolutional Networks for Aspect-Based Sentiment Classification with Multi-Head Attention

Appl. Sci. 2021, 11(8), 3640; https://doi.org/10.3390/app11083640
by Guangtao Xu 1, Peiyu Liu 1,*, Zhenfang Zhu 2, Jie Liu 1 and Fuyong Xu 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2021, 11(8), 3640; https://doi.org/10.3390/app11083640
Submission received: 1 March 2021 / Revised: 6 April 2021 / Accepted: 16 April 2021 / Published: 18 April 2021
(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

This manuscript showed novel GNN for aspect-based sentiment classification, attention-enhanced GNN, due to resolve the issue that the performance of the GNN is affected by noise and tree pairing. This novel  method is very interesting and important. But some corrections may require. Firstly, the construction of the GNN should be represented as figures. Also, the performance of the novel method was assigned by accuracy and F1. Other assessment indexes should be added and the discussion should add by comparison of these indexes. In addition, each examination for the performances should perform by multiple-test, and average and standard deviation of each performance indexes should add. Also, the advantages and disadvantages of this novel approach should add. 

Author Response

Thank you very much for your valuable suggestions, and we apologize for the inconvenience caused to the reviewer. The following is our response to your comment.

Q.1: Firstly, the construction of the GNN should be represented as figures.

A.1: In order to facilitate understanding, we have drawn the structure diagram of the AEGCN layer in detail in the third section of Chapter 3 (page 6, figure 3).

Q.2: Other assessment indexes should be added and the discussion should add by comparison of these indexes.

A.2: At present, the mainstream model evaluation indicators in this field use Accuracy and Macro-F1. Therefore, in order to compare with other models, our article also uses Accuracy and Macro-F1 as evaluation indicators. In order to further evaluate the model, we used visual analysis in Chapter 4, Section 5 to verify that our model can indeed correctly capture the connection between aspect words and opinion words.

Q.3: In addition, each examination for the performances should perform by multiple-test, and average and standard deviation of each performance indexes should add.

A.3: As mentioned in the first section of Chapter 4 of this article(Page 7,line 240), all the experimental results of our model are the average of three random initialization runs. The experimental results of all models are constantly adjusting the hyperparameters and taking the average of the results after running on this hyperparameters one or more times. Therefore, in order to compare with other models, we still use this method. In addition, adding average and standard deviation means that for each group of hyperparameters, the number of experiments will increase dozens of times, which means that the computational complexity will increase sharply, so we still use the original method.

Q.4: Also, the advantages and disadvantages of this novel approach should add.

A.4:  The advantages of our scheme are mentioned in the abstract part and the introduction part of the article, and are described in detail in Chapter 4, Section 3.  In summary, there are two points:

  1. We have added an attention mechanism to the traditional graph convolutional network to enhance its ability to capture relevant node information and reduce the impact of noise information.
  2. We introduced a multi-headed attention mechanism to interact with the contextual semantic features captured by the attention coding layer and the syntactic features captured by the AEGCN layer, alleviating the impact of the instability of the dependency tree.

In order to prevent the article from being too redundant, we did not add an additional description of the advantages.

The shortcomings of this scheme are mentioned in Chapter 5 (page 11, line 380). Our model only interacts semantic information and syntactic information in the last layer. It is possible that semantic information and syntactic information can guide each other's learning to obtain better representation information. The focus of this article is to study and solve the two problems caused by the limitations of the dependency tree itself, and other shortcomings of our model will be discussed in subsequent research.

Q.5: Moderate English changes required.

A.5: I checked and corrected the grammatical errors in the full text.

Author Response File: Author Response.docx

Reviewer 2 Report

The work is good.

Authors should improve description of GloVe technique first and BI Directional LSTM before introducing them. Just few sentences.

Avoid talking about toolkits (SpaCy) and describe what they do formally.

 

Author Response

Thank you very much for your valuable suggestions. Our responses to your comments are as follows:

Q.1: Authors should improve description of GloVe technique first and BI Directional LSTM before introducing them. Just few sentences.

A.2: We added some descriptions to GloVe and BiLSTM respectively in the first section of Chapter 3(page 4,line 151-153 and line 156-159). Since GloVe and BiLSTM are widely used components in the current field, in order to avoid the article being too redundant, we only briefly introduce it in a few sentences.

Q.2: Avoid talking about toolkits (SpaCy) and describe what they do formally.

A.2: Perhaps because of my explanation is not clear that led your misunderstanding. We only use the SpaCy toolkits to obtain the dependency tree of the sentence, and the subsequent process of using the dependency tree to obtain the adjacency matrix is our work. I re-modified the statement of this part of the content(page 6, line 200-202).

Q.3: English language and style are fine/minor spell check required.

A.3: I checked and corrected the grammar and spelling errors in the paper.

Author Response File: Author Response.docx

Reviewer 3 Report

- Please address the previous works as follows:
Zhu, Y., Zheng, W., & Tang, H. (2020). Interactive Dual Attention Network for Text Sentiment Classification. Computational Intelligence and Neuroscience, 2020.
Sun, J., Han, P., Cheng, Z., Wu, E., & Wang, W. (2020). Transformer Based Multi-Grained Attention Network for Aspect-Based Sentiment Analysis. IEEE Access, 8, 211152-211163.
Si, C., Chen, W., Wang, W., Wang, L., & Tan, T. (2019). An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1227-1236).
Zhang, Q., Lu, R., Wang, Q., Zhu, Z., & Liu, P. (2019). Interactive multi-head attention networks for aspect-level sentiment classification. IEEE Access, 7, 160017-160028.

- Very similar ideas using variants of the transformer for the aspect-level sentiment classification task have been seen in the past two years. Authors should enhance the introduction by including the similar latest works and clarifying the differences from those ideas to justify its own unique contributions with regards to multi-head attention networks. 
- Accordingly, please add some relevant ones to the performance comparison in the result section.
- Please clarify the Figure 1. not easy to understand this figure without detail descriptions.
- L116: please fix this "a relational GA T" to be clear.
- Eq(1): state what is the h in head_h
- can you discuss about how the models can deal with the biases (or errors) from the spaCy outputs since the spaCy toolkit is mainly used to construct a dependency tree for each sentence.
- In Table 2 and 3, are these differences in the results statistically significant?
- It'd be greatly helpful if the source code is publicly available. 

Author Response

First of all, thank you very much for your valuable suggestions, which helped me a lot. The following is my response to your suggestion.

Q.1: Please address the previous works as follows: ……

A.1: In the related work (page 3, line 104-112), I have added an introduction to the work of Zhu et al. (2020), Sun et al. (2020) and Zhang et al. (2019). I did not add the work of Si et al.(2020), because first of all, I did not know much about the image field and did not understand this paper in a short period of time. Secondly, its training data set was inconsistent with what I used, so it was impossible to compare the experimental results. Regardless of the field, through the understanding of the formula part of his model, I think his work is to improve the LSTM structure to suit his task. Our work is to improve the graph convolutional network and introduce a multi-head attention mechanism to solve the two problems caused by the limitations of the dependency tree. Although the name is somewhat similar, the actual structure is completely different, so we did not add this work introduction. Perhaps in the follow-up research we can consider using his ideas to transfer this improved LSTM structure to the field of natural language processing.

Q.2: Authors should enhance the introduction by including the similar latest works and clarifying the differences from those ideas to justify its own unique contributions with regards to multi-head attention networks. 

A.2: The purpose of our paper is to improve the traditional graph convolutional network to solve the problems caused by the limitations of the dependency tree.  Our contribution is not to innovate the structure of multi-head attention, but to use multi-head attention to solve our problems. Therefore, the focus is more on comparing with other models that use sentence structure information, instead of spending too much space to compare other models based on multi-head attention. For the above reasons, I only added three of the papers you recommended.

Q.3: Accordingly, please add some relevant ones to the performance comparison in the result section.

A.3: In the second section of chapter four (page 9, line 278-294), I added the work of sun (2020) and (zhang), but did not add the work of zhu (2020), because the task they deal with and the datasets they use are inconsistent with ours. In the Results and Analysis chapters, we made a comparative analysis of their experimental results(page 10, line 330-342).

Q.4: Please clarify the Figure 1. not easy to understand this figure without detail descriptions.

A.4: I think it may be because I did not add picture descriptions to the two components of Attention Coding Layer (ACL) and AEGCN, which caused inconvenience in understanding. So I added the structure diagrams of the corresponding components in the second and third sections of Chapter 3 (page 5, figure 2; page 6, figure 3). We also checked whether the input and output of each layer are clear, and added an explanation of HA in line 192 on page 5. In addition, we modify the "Attention Encoder Layer" in Figure 1 to "Attention Coding Layer". It is possible that the nouns in Figure 1 are inconsistent with the detailed description of the model later, which also causes inconvenience in understanding (page 4, figure 1).

Q.5: L116: please fix this "a relational GA T" to be clear.

A.5: I deleted an extra space in the abbreviation GAT and added the original word corresponding to the abbreviation (page 3, line 123).

Q.6: Eq(1): state what is the h in head_h.

A.6: I added an explanation of "h" in line 185 on page 5.

Q.7: can you discuss about how the models can deal with the biases (or errors) from the spaCy outputs since the spaCy toolkit is mainly used to construct a dependency tree for each sentence.

A.7: Our model solves two problems. The first problem corresponds to your question. When the parser does not parse the correct dependency tree (caused by the poor parsing performance of the parser itself or the incorrect grammatical structure of the input text), we cannot obtain the correct adjacency matrix. Then the AEGCN layer cannot obtain the correct output. Therefore, for this problem, we introduced a multi-head attention mechanism. First, the attention coding layer (ACL) is used to further extract the semantic information, and then two multi-head interactive attention is used to interact the semantic information and the syntactic information. The output of the two multi-head interactive attention is spliced as the final feature vector output, so that the model does not only rely on the sentence structure information, and can alleviate the problem caused by the dependency tree error. We have a detailed description of how the model solves this problem in the introduction (page 2, lines 51-65), so there is no modification to the original paper.

Q.8: In Table 2 and 3, are these differences in the results statistically significant?

Q.9: For your question, what I understand is whether our experimental results are meaningful and not random results caused by sampling errors or other problems. On page 8, line 260, we mentioned that all experimental results of our model are the average of three random initialization runs, so they are statistically significant.

Q.10: It'd be greatly helpful if the source code is publicly available. 

A.10: The work we are doing is based on the work of this article, so we will not open source code for the time being.

Q.11: English language and style are fine/minor spell check required.

A.11: I checked and corrected the grammar and spelling errors in the paper.

Author Response File: Author Response.docx

Back to TopTop