RAG-TCGCN: Aspect Sentiment Analysis Based on Residual Attention Gating and Three-Channel Graph Convolutional Networks

Xu, Huan; Liu, Shuxian; Wang, Wei; Deng, Le

doi:10.3390/app122312108

Open AccessArticle

RAG-TCGCN: Aspect Sentiment Analysis Based on Residual Attention Gating and Three-Channel Graph Convolutional Networks

by

Huan Xu

,

Shuxian Liu

^*,

Wei Wang

and

Le Deng

College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(23), 12108; https://doi.org/10.3390/app122312108

Submission received: 5 August 2022 / Revised: 20 October 2022 / Accepted: 23 November 2022 / Published: 26 November 2022

Download

Browse Figure

Versions Notes

Abstract

:

Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task that mainly judges the polarity of a given aspect word in a review. Current methods mainly use graph networks to do aspect-level sentiment classification tasks, most of which use syntactic or semantic graphs, and utilize attention mechanisms to interact and correlate aspect terms and contexts to obtain more useful feature representations. However, these methods may ignore some insignificant syntactic structures and some implicit information in some sentences. The attention mechanism then easily loses the original information, which eventually leads to inaccurate sentiment analysis. In order to solve this problem, this paper proposes a model based on residual attention gating and three-channel graph convolutional network (RAG-TCGCN). Firstly, the model uses a three-channel network composed of syntactic information, semantic information, and public information to simultaneously optimize and fuse through the multi-head attention mechanism to solve the problem of sentences without significant syntactic structure and with implicit information. Through the residual attention gating mechanism the problem of loss of original information is solved. Experimental verification shows that the accuracy and F1 value of the model are improved on the three public datasets.

Keywords:

residual attention gating mechanism (RAG); three-channel network; graph convolution

1. Introduction

Sentiment analysis is an important branch of natural language processing [1,2,3], which has attracted increasing attention in recent years. Sentiment analysis is considered to be the foundation for the realization of powerful artificial intelligence in the future and a prerequisite for machines to fully understand human language. Therefore, sentiment analysis is a key technique. Sentiment analysis can be divided into two types: fine-grained sentiment analysis and coarse-grained sentiment analysis [4]. Coarse-grained sentiment analysis can be divided into document-level and sentence-level sentiment analysis [5]. Early sentiment analysis efforts were mainly based on document-level and sentence-level, which were designed to detect the sentiment polarity (e.g.,: positive, neutral, or negative) of the entire document or sentence. However, this method leads to inaccurate sentiment analysis for some sentences with multiple polarities. For example: “The food at this restaurant is great, but the service is terrible!” Such a sentence is likely to be classified as neutral based on document-level and sentence-level sentiment classification models. However, for a person to know that it contains both positive and negative emotional polarities in terms of “food” and “service”. Based on this phenomenon, many researchers began to conduct ABSA research. ABSA is a fine-grained sentiment analysis method. Currently, ABSA research can be divided into two major trends: one is to improve the accuracy of sentiment analysis based on known model improvements [6,7,8], and the other is to study how to simultaneously detect aspect and sentiment [9,10].

Usually, aspect-based sentiment analysis needs to go through multiple steps, such as obtaining word embeddings, encoding syntactic information, and extracting semantic information, among which mining the most relevant opinion words is the key. With the successful application of the attention mechanism, some researchers combine the attention mechanism with a neural network to complete various tasks, and have achieved good results. Later, a large body of work used attention mechanism-based neural networks for sentiment analysis [11,12]. However, the attention mechanism introduces noise, which causes the model to learn context words that are not relevant to the current aspect [13,14]. For example, in the comment “The food is delicious but the service was mediocre”, when analyzing the sentiment polarity of “food”, we want the model to only focus on sentiment words related to “food” in “the food is delicious”, rather than sentiment words unrelated to “food” in the entirety of the review text. Recently, the rapid development of graph neural networks has attracted great interest, and a class of graph neural networks has been designed to extract syntactic information from dependency trees due to its enormous ability to learn structural representations [15,16,17,18,19]. Compared with attention-based models, although these models have made many improvements, their shortcomings cannot be ignored. First, sentences have different sensitivities to grammatical information and semantic information. Those sentences whose syntactic structure is not obvious have an especially low sensitivity to syntactic information, which means that in some cases syntactic information may not help the model to determine the sentence’s emotional polarity.

In response to the above problems, we propose the RAG-TCGCN model, which uses a three-channel network composed of syntactic information, semantic information, and public information to simultaneously optimize and fuse through a multi-head attention mechanism to solve the problem of sentences without significant syntactic structure and with some implicit information. The attention mechanism can capture words related to the emotional expression of the aspect term, and the essence of the attention mechanism is to increase the weight of some information and weaken others [20]. However, the part weakened by the attention mechanism may lead to the loss of some useful information, and the attention mechanism will bring noise. Therefore, we use the residual attention gating mechanism to solve this problem, which addresses noise through gating and alleviates information loss through a residual mechanism.

The main contributions of this paper are as follows:

1. For the aspect-level sentiment analysis task, a model based on residual attention gating and a three-channel graph convolutional network (RAG-TCGCN) was designed, in which the three-channel graph convolutional network can adaptively learn important information according to the characteristics of sentences.

2. We employed a Multi-Head Self-Attention (MHSA) mechanism to capture the key information output by the three-channel graph convolution to refine the aspect-wise sentence features.

3. We propose a gating mechanism for residual attention, which can capture words that are more important for the sentiment analysis of current aspect terms. At the same time, this mechanism can also weaken the problem of loss of original information in the attention mechanism, reduce the loss of model information, and make model information extraction more complete and useful.

4. Experiments show the importance of proper use of syntactic, semantic and syntactic, and semantic combination information, the necessity of an attention-gated residual mechanism, and prove that residual attention-gating and a three-channel graph convolutional network demonstrate effectiveness and robustness in sentiment classification tasks.

The rest of this paper is organized as follows. In Section 2, we briefly review related work on aspect-based sentiment analysis. In Section 3, the RAG-TCGCN model details are presented. Section 4 evaluates the model and analyzes the experimental results on three public datasets. Section 5 concludes the paper.

2. Related Work

Traditional sentiment analysis tasks are mainly for document-level and sentence-level classification tasks. In contrast, the aspect-level sentiment analysis task is an entity-oriented sentiment analysis task and is also a fine-grained sentiment analysis task. The goal is to extract, for several aspects of an entity, the sentiment polarity of each aspect. Commonly used methods include a sentiment dictionary, machine learning, and deep learning. We compare the advantages and disadvantages of these methods in Table 1. The specific methods are detailed as follows:

Using a sentiment dictionary method to solve an aspect-level sentiment analysis task is closer to the solution idea of the traditional sentiment analysis method, and it is necessary to use the marked sentiment dictionary to determine the sentiment polarity. Common sentiment dictionaries are: SentiWordNet [21], NTUSD [22], and How Net [23]. Nguyen et al. [24] further optimized the dictionary-based approach, achieving better performance through the kernel recognition aspect of the tree connected to the opinion. Ding et al. [25] proposed a dictionary-based method that extracts the corresponding sentiment polarity for both explicit and implicit aspects.

Compared with sentiment dictionary methods, machine learning methods are used more in the field of aspect-level sentiment analysis, while traditional machine algorithms mainly include: Naive Bayes (NB), Maximum Entropy (ME), Logistic Regression (LR), and the Support Vector Machine (SVM), etc. Pang et al. [26] used a movie review text as a corpus to analyze the text and found that the support vector machine (SVM) and Bayesian algorithm (Bayes) have better classification results. Waila et al. [27] used Bayesian and SVM to analyze movie reviews with a semantic-based SO-PMI-IR algorithm. Zhao et al. [28] used a machine learning method to perform a sentiment analysis on the review text data in the catering field using the Bayesian, C4,5 method.

Deep learning methods are widely used in the field of aspect-level sentiment analysis, and its main research can be roughly divided into three categories: simple neural networks, attention-based neural networks, and graph convolutional networks. Simple neural networks mainly include: the Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) network, etc. Chen [29] proposed the TextCNN neural network model, and applied the CNN network model to the field of text classification for the first time, dividing sentences into word-level processing, so that the text features could be extracted by the model itself, but the sentence context information could not be learned. Zhang et al. [30] used a recurrent neural network (RNN) to train sentence vectors of fixed dimensions for sentences of different lengths, adding sequence features of words and enriching the representation of sentence vectors, but this model exhibits the phenomenon of exploding gradient and vanishing gradient. Nowak [31] and others proposed using an LSTM network for sentiment classification, which can effectively solve the defect of RNN’s loss of sentence information due to distance, but cannot receive parallel input of text. The emergence of attention-based neural networks has promoted further progress in sentiment analysis tasks. The main idea is to capture and establish the connection between aspects and opinion words through the attention mechanism. Strictly speaking, attention-based neural network models can also be used as a way to use sentence structure information, since the distance between aspects and opinion words is generally not too far. Wang et al. [32] proposed an attention-based approach to acquire meaningful information from specific aspects. Furthermore, given the advantages of the multi-head attention mechanism in modeling contextual semantic relations, Song et al. [33] proposed an attention encoder network to map the hidden states and semantic interactions between target words and context words. Chen et al. [34] addressed the long-distance dependencies between aspects and opinion words using a multi-layer attention network. Li et al. [35] introduced a multi-layer attention mechanism to capture aspects of distant opinion words. Since graph convolutional networks [36] were first proposed, and due to their excellent performance in processing graph structural information, they were quickly applied to various tasks in the field of natural language processing, and achieved great results in aspect-level sentiment analysis. Zhang et al. [15] transformed the syntactic dependency tree into a graph structure and used the syntax to extract information. Sun et al. [16] proposed using graph convolutional networks to learn node representations on dependency syntactic trees, and using the resulting node representations, along with other features, for sentiment classification. Wang et al. [37] defined a unified aspect-oriented dependency tree structure and proposed a relational graph attention network (GAT) to encode a new tree structure for sentiment prediction. Huang et al. [38] used a dependency graph to directly propagate sentiment features from the syntactic context of aspect targets. Zhao et al. [8] constructed a graph by considering the emotional similarity between aspects in a sentence, and then utilized a graph convolutional neural network combined with an attention mechanism to extract the emotional dependencies between different aspects in a sentence.

3. Methodology

In this section, the proposed RAG-TCGCN model will be introduced in detail. The overall framework of the RAG-TCGCN model is shown in Figure 1. Inspired by dualGCN [39], the key idea of this paper is to add a common information sharing channel based on the syntactic graph and semantic graph, and then use a residual attention gating mechanism to form a three-channel graph convolution network based on residual attention gating. This can solve the above-mentioned problems well. The RAG-TCGCN model framework in this paper mainly includes an input layer, a graph convolution module, an attention module, and an output layer. The graph convolution module consists of syntactic, semantic, and public information modules. The various components of the model are described in detail as follows.

3.1. Input Layer

The input layer is mainly composed of word embedding and sentence encoding.

(1) Word embedding: Given a sentence consisting of n words,

s = {w_{1} {, w}_{2}, \dots {, w}_{T + 1} \dots {, w}_{T + m}, \dots {, w}_{n - 1} {, w}_{n}}

where

{w_{T + 1}, \dots {, w}_{T + m}}

. The representation has m aspects and each word in the sentence is looked up through the word embedding matrix

L \in R^{| V | {\times d}_{w}}

of pre-trained Glove [40], where |v| represents the size of the vocabulary and

d_{w}

represents the dimension of word embeddings. Thus each word is embedded in a low-dimensional vector.

(2) Sentence encoding: In this paper, a bidirectional LSTM is used for encoding. In this way, the hidden representation of a given sentence, s, can be extracted. Given that two LSTM forward and reverse time series are connected to one output, this is expressed as follows:

\vec{H} = {\vec{h_{1}}, \vec{h_{2}}, \dots, \vec{h_{T + 1}}, \dots \vec{h_{T + m}}, \vec{h_{n - 1}}, \vec{h_{n}}}

(1)

\overset{\leftarrow}{H} = {\overset{\leftarrow}{h_{1}}, \overset{\leftarrow}{h_{2}}, \dots, \overset{\leftarrow}{h_{T + 1}}, \dots \overset{\leftarrow}{h_{T + m}}, \overset{\leftarrow}{h_{n - 1}}, \overset{\leftarrow}{h_{n}}}

(2)

H = [\vec{H}, \overset{\leftarrow}{H}]

(3)

where

\vec{H}

represents forward,

\overset{\leftarrow}{H}

represents reverse, and H represents the final representation encoded by the bidirectional LSTM, which contains contextual information between aspect words and opinion words.

3.2. Syntactic Graph Convolution Moduler

Sun et al. [16] proposed encoding the dependency tree and combining the dependency paths between words to extend the GCN model. Therefore, this method is used to convert the dependency tree into a graph structure:

G_{sy} = (A_{sy} {, H}_{sy})

, where

A_{sy}

is the adjacency matrix of the graph, and

H_{sy}

the feature matrix. Then, this graph convolutional network is used to extract syntactic information, which is formulated as follows:

H_{i}^{l + 1} = ReLU (\sum_{j = 1}^{n} A_{ij} W^{l + 1} H_{j}^{l} {+ b}^{l + 1})

(4)

H_{1} = H

(5)

H^{sy} = {H_{1}^{sy} {, H}_{2}^{sy}, \dots {, H}_{n}^{sy}}

(6)

where

A_{ij}

represents the adjacency matrix,

W^{l + 1}

represents the weight matrix,

b^{l + 1}

represents the bias value,

H_{1} = H

represents the hidden state vector H from BiLSTM as the initial node representation in the syntactic graph,

H_{i}^{sy}

represents the hidden representation of the ith node, and

H^{sy}

represents the grammar information representation.

3.3. Semantic Graph Convolution Module

Since some short sentences have ambiguous syntactic structures, if the model extracts syntactic information bluntly, it may lead to inaccurate prediction results. On the other hand, the syntactic structure is fixed, which may lead to aspects that are far from their opinion words or two words that should be connected on the dependency tree but are not, leading to missing important information. Inspired by Vaswani et al. [41], this paper designs a semantic graph module, which can directly capture the semantically related information of each word in the sentence through the attention mechanism, unlike the syntactic graph, which requires additional syntactic knowledge. This makes it more efficient than the syntactic graph structure as it is more flexible and can adapt to information that is not sensitive to syntactic information. The semantic information is formulated as follows:

A^{se} = softmax (\frac{{UW}^{u} \times {({PW}^{p})}^{T}}{\sqrt{d^{m}}})

(7)

H^{se} = ReLU (\sum_{j = 1}^{n} A^{se} W^{l + 1} H_{j}^{l} {+ b}^{l + 1})

(8)

where

A^{se}

is the score matrix of the attention mechanism, which is used as the adjacency matrix of the semantic module, while

W^{u}

and

W^{p}

represent the learning weight matrix,

d^{m}

is the dimension of the input node feature,

W^{l + 1}

represents the weight matrix,

b^{l + 1}

represents the bias value, and

H^{se}

represents the semantics Information representation.

3.4. Common Graph Convolution Module

According to Liina et al. [42], it can be seen that syntactic and semantic space are not completely separated, but are related to each other. As the syntactic structure of the sentence changes, the semantics will also change accordingly. Therefore, in order to better understand a sentence, it is necessary to establish the common information shared by syntactic and semantic spaces. Inspired by Wang et al. [18], a common graph convolution module with a parameter-sharing strategy was used to obtain information shared by both syntactic and semantic spaces. Public information is formulated as follows:

H_{_csy} {= H}^{sy}

(9)

H_{_cse} {= H}^{se}

(10)

H^{com} = \frac{{λ H}_{_csy} {+ δ H}_{_cse}}{2}

(11)

where

H_{_csy}

stands for common syntax,

H_{_cse}

stands for common semantics,

H^{com}

stands for common information representation, and

λ

and

δ

stand for trainable parameters.

3.5. Attention Module

In this section, we divide attention into multi-head self-attention and residual-based gated attention. Multi-head self-attention is used to fuse the three-channel graph to refine features again, while residual-based gated attention, utilizing the residual mechanism directly, introduces the input information into the output result, reducing the loss of information and preventing the occurrence of network degradation. Therefore, we utilize an attention machine-based residual network to control the information flow, which makes information extraction more complete and useful.

3.5.1. Multi-Head Self-Attention

Multi-head self-attention (MHSA) uses multiple attention heads to capture information in parallel. Each attention head focuses on different aspects, and finally the information of each attention head is combined to obtain more refined characteristic information. Taking three identical

H^{three} = [H^{sy}; H^{com}; H^{se}]

as input, MHSA is defined as:

{Attention}_{i} (H^{three}, H^{three}, H^{three}) = Soft \max (\frac{H^{three} H^{three}^{T}}{\sqrt{d_{k}}}) H^{three}

(12)

{head}_{i} = {Attention}_{i} (H^{three}, H^{three}, H^{three})

(13)

MHSA (H^{three}, H^{three}, H^{three}) = ({head}_{1} \oplus {head}_{2} \oplus \dots \oplus {head}_{h}) \cdot W^{O}

(14)

H^{left} = MHSA (H^{three} {, H}^{three} {, H}^{three})

(15)

where h represents the number of attention heads in multi-head attention,

\oplus

represents vector connections,

W^{O}

represents parameter matrix,

{head}_{i}

represents the output of the ith attention head, and

H^{left}

represents the refined feature representation after fusion.

3.5.2. Residual Attention Gating

In a text, the importance of each word to the emotional expression of the aspect word is different, and generally only a few words play a decisive role in the emotional expression. The residual mechanism directly introduces the input information into the output result, reducing the loss of information and preventing the occurrence of network degradation. Therefore, we utilize an attention-machine-based gated residual network to control the information flow, which makes information extraction more complete and useful. Attention gating is defined as follows:

O_{att} = Soft \max (\frac{{qk}^{T}}{\sqrt{d_{k}}} v)

(16)

Gat = sigmoid (O_{att})

(17)

H^{right} = mul (O_{att}, Gat)

(18)

where q, k, and v represent the output of the LSTM after linear transformation, and

H^{right}

represents the processed representation of the original text information.

3.6. Output Layer

In this part, the information of the syntactic graph, semantic graph, and public graph needs to be refined through multi-head self-attention, fused with the original text information that has been specially processed, and finally the result of sentiment classification is output. However, before feature fusion, the corresponding aspect vectors

h_{a}^{left}

and

h^{right}

need to be obtained from

H^{left}

and

H^{right}

using the average pooling operation. The formula is as follows:

h_{a}^{left} = f (H^{left})

(19)

h^{right} = f (H^{right})

(20)

h_{a} {= h}_{a}^{left} {+ h}^{right}

(21)

where f () represents the average pooling function, and Equation (21) represents the residual connection calculation, which combines the three-channel network feature representation from multi-head self-attention with the special processing information of the input original text to better generate a useful and complete feature representation,

h_{a}

. Next, through the softmax function, the probability, P, of emotional polarity is obtained, and the formula is as follows:

P = soft \max (h_{a} W + b)

(22)

where W represents the learning weight and b represents the bias value. Finally, through the continuous optimization model of loss function, L is obtained:

L = - \sum_{i}^{c} \sum_{j}^{z} P \log \hat{P}

(23)

4. Experiment

4.1. Datasets

This paper selected three public datasets, Laptop, Restaurant, and Twitter comments [43], for experiments. Laptop and Restaurant were selected from the SemEval-2014 Task [44]. The three datasets contain three different sentiment polarities, negative, neutral, and positive, and each sentence in these datasets is labeled with an aspect and its corresponding polarity. The specific data distribution is shown in Table 2.

4.2. Dataset Implementation and Parameter Settings

The experimental environment of this paper is shown in Table 3. For all experiments, the pre-trained 300-dimensional Glove word vector was used to initialize the word embedding, the 1-layer BiLSTM, 2-layer GCN, and Adam optimizer were used for optimization, and the remaining detailed experiments were carried out. The parameters are shown in Table 4.

4.3. Baseline Methods

In this paper, the following mainstream models were selected for comparison and run in the same experimental environment. The results are shown in Table 5.

(1): IAN [12]: Simultaneous modeling of aspect words and context information to make aspect words and context information interactively fuse with attention.
(2): AOA [45]: Simultaneous modeling of aspects and text using long short-term memory neural networks to focus on what is important in sentences.
(3): RAM [34]: Mem-Net is enhanced with deep Bi-LSTM and position weighting. The gated recurrent unit network is nonlinearly combined with multiple attention results using a recurrent network to obtain the final feature representation.
(4): MGAN [46]: Fine-grained attention is proposed to solve the loss caused by coarse-grained attention, and then combined with coarse-grained attention to predict the sentiment polarity of sentences.
(5): ASGCN [15]: GCNs on syntactic dependency trees are built and generate aspect-oriented sentence representations by applying masking and an attention mechanism. Finally, two variants of ASGCN are proposed, namely, ASGCN-DG based on an undirected dependency graph and ASGCN-DT based on a directed dependency tree.
(6): CDT [16]: BiLSTM is used to obtain the feature representation of the sentence and further enhance the embedding by a direct convolution operation on the dependency tree.
(7): BiGCN [47]: A conceptual hierarchy is built on the syntactic and lexical graphs to distinguish various types of dependencies or lexical word pairs, and a two-layer interactive graph convolutional network is designed to take full advantage of these two graphs.
(8): R-GAT [37]: An aspect correlation tree rooted in aspect terms by reshaping the dependency parse tree is constructed, which uses relational graph attention network coding.
(9): DGEDT [6]: A dual-transformer structure is designed to enable interaction enhancement between planar representations learned from the transformer and graph-based representations.
(10): DualGCN [39]: The dual-graph convolutional network considers both the complementarity of the syntactic structure and semantic relevance.

The comparison models selected in this paper can be divided into two categories: attention mechanism and graph neural network. IAN, AOA, RAM, and MGAN mainly use an attention mechanism to complete classification tasks. ASGCN, CDT, BiGCN, R-GAT, DGEDT, and DualGCN all use dependency trees to obtain syntactic information and utilize graph neural networks for classification tasks.

4.4. Comparative Results and Analysis

The results of comparison with all baseline models are shown in Table 5. We drew the following conclusions:

(1): On the three datasets of Restaurant, Laptop and Twitter, our model performance is better than the attention-based and syntax-based models, which shows that the RAG-TCGCN model performs better in encoding syntactic and semantic information through the adaptive fusion of syntactic and semantic information.
(2): On the three datasets, the performance of the syntax-based model and the attention-based model is very different. The syntax-based model (ASGCN, CDT, BiGCN) is superior to the attention-based model (IAN, AOA, RAM). The main reason is that syntactic structure can more effectively capture the relationship between aspects and corresponding sentiment words, and extract more useful information.
(3): Compared with attention-based models (MGAN, IAN), our model demonstrates significant improvement. The IAN model mainly obtains the initial feature representation through an LSTM pre-training model, and then obtains relatively rich semantic feature information through an attention mechanism. The effect of the model highly depends on whether the attention mechanism can accurately establish the connection between aspects and context. However, due to the complexity of a sentence and the inadequacy of the attention mechanism in capturing long-distance dependent information, this introduces some irrelevant information and generates noise, resulting in a poor performance of the model. However, our model constructs the relationship between aspects and opinion words by using syntactic information. Therefore, the noise introduced by the attention mechanism is avoided.
(4): Compared with syntax-based models (R-GAT, DGEDT), our model's performance demonstrates significant improvement. Syntax-based models mainly obtain local features through syntactic structures and establish word-word relationships, but they ignore the global information and the semantic information between words. However, when there are sentences with obscure grammatical structure or complex sentences, extracting features only by syntactic knowledge leads to poor results.
(5): Compared with the DualGCN model, our RAG-TCGCN model shows an improvement of 0.32% and 0.74% on Laptop and Twitter accuracy, respectively, and 0.3% and 1.12% on F1. Although a DualGCN model extracts information features syntactically and semantically, due to the complexity of sentences, each sentence has a different sensitivity to syntax and semantics. Therefore, important information cannot be obtained by self-learning according to the characteristics of sentences. On the basis of syntax and semantics, our model adds public information channels to form a three-channel network, which can learn adaptively and fuse according to the characteristics of sentences. This allows it to obtain good results.

4.5. Ablation Study

For this section, further ablation studies were carried out to verify the effectiveness of each module in RAG-TCGCN, and the results are shown in Table 6. The SynGCN (syntactic graph convolutional network) model represents the syntactic feature channel network; the semGCN (semantic graph convolutional network) model represents the semantic feature channel network; the common model represents the public information channel network; the TCGCN (three-channel graph convolutional network) represents the three-channel graph convolutional network.

It can be seen from Table 6 that, firstly, on the Restaurant and Laptop datasets, the syntax-based SynGCN model performed better than the semantics-based SemGCN model, suggesting that a rich knowledge of grammar can reduce dependent parsing errors. SemGCN models performed better than SynGCN models on Twitter data, mainly because there are more comments with obscure syntax structure in the Twitter dataset compared with the Restaurant and Laptop datasets. Secondly, on the three datasets, the performance of a single SynGCN, SynGCN, or Comom model is lower than that of a three-channel graph convolution network (TCGCN). The main reason is that, due to the complexity of sentences, each sentence has a different sensitivity to semantics and syntax. Therefore, it is difficult for a single-channel network to adaptively extract important information according to the characteristics of sentences. However, compared with our model, the performance of the three-channel graph convolution neural network (TCGCN) is still insufficient, mainly because the TCGCN model over-refines features, leading to the loss of some important original information, resulting in poor performance. Our model adds a residual attention gating mechanism to reduce the loss of important original information. Overall, our model achieved the best performance.

4.6. Case Study

To better understand our model, we compared different models in several test examples. The experimental results are shown in Table 7. The Attention Visualization column in the table shows the different attention scores for each model, using darker to lighter colors depending on how high or low they scored. In the first example, “Great food but terrible service”, the sentence contains two aspects, “food” and “service”, as well as two sentiment words, “great” and “terrible”. The attention-based IAN model does not capture the connection between the two well, resulting in inaccurate predictions. In the second and third examples, IAN models still fail to correctly establish the relationship between aspect words and sentiment words due to the complexity of the sentences. Although it is possible to directly establish a connection between an aspect word and sentiment words through syntactic structure, this connection does not exist in a sentence with no obvious syntactic structure. In the third example, the SynGCN model does not capture the feature representation of the keyword “not”, resulting in incorrect prediction results, while the SemGCN, and Comom models can focus on the semantic correlation between words and capture this “not” feature representation, thereby obtaining correct prediction results. In the second example, when using the syntax or semantics, public information convolution modules incorrectly allocate the highest attention to “wonderful”. Treating “wonderful” as a sentiment word for “dinner” eventually leads to errors in sentence judgment. However, our proposed RAG-TCGCN model adaptively reduces the attention to the irrelevant word “wonderful” and adaptively increases the score of “dinner”.

5. Conclusions

In this article, we proposed a RAG-TGCN model for ABSA tasks. It solves the shortcomings of attention-based and graph convolution operations in the dependency tree. First, in order to alleviate the problems of traditional dependency methods and lack of semantic information, we used a multi-head attention mechanism to assist and enhance syntax and semantics to obtain more important information. At the same time, in order to alleviate the noise caused by the attention mechanism and the challenge of losing some of the original important information due to the over-refined features, we added a residual attention gating mechanism to alleviate the loss of original important information. Experimental results on three benchmark datasets show that our model outperforms other baseline methods. In addition, ablation studies and case studies verify the role of each component of our proposed model. In future, we want to integrate domain knowledge so that the model can perform other classification tasks.

Author Contributions

Conceptualization, H.X. and S.L.; methodology, H.X.; software, L.D. and W.W.; validation, H.X.; formal analysis, H.X. and S.L.; investigation, W.W. and L.D; data curation, S.L.; writing—original draft preparation, H.X. and S.L.; writing—review and editing, H.X.; supervision, W.W. and L.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (61762085), and the Natural Science Foundation of Xinjiang Uygur Autonomous Region Project (2019D01C081).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Pang, B.; Lee, L. Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2008, 2, 1–135. [Google Scholar] [CrossRef] [Green Version]
Liu, B. Sentiment Analysis and Opinion Mining. Synth. Lect. Hum. Lang. Technol. 2012, 5, 1–167. [Google Scholar]
Medhat, W.; Hassan, A.; Korashy, H. Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 2014, 4, 1093–1113. [Google Scholar] [CrossRef] [Green Version]
Fink, C.R.; Chou, D.S.; Kopecky, J.J. Coarse- and fine-grained sentiment analysis of social media text. Johns Hopkins Appl. Techn. Digest. 2011, 1, 22–30. [Google Scholar]
Rana, T.A.; Cheah, Y. Aspect extraction in sentiment analysis: Comparativeanalysis and survey. Artif. Intell. Rev. 2016, 46, 459–483. [Google Scholar] [CrossRef]
Tang, H.; Ji, D.H.; Li, C.L.; Zhou, Q. Dependency graph enhanced dual-transformer structure for aspect-based sentiment classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, 5–10 July 2020; pp. 6578–6588. [Google Scholar]
Lai, Y.; Zhang, L.; Han, D.; Zhou, R.; Wang, G. Fine-grained emotion classification of chinese microblogs based on graph convolution networks. World Wide Web. 2020, 23, 2771–2787. [Google Scholar] [CrossRef]
Zhao, P.; Hou, L.; Wu, O. Modeling sentiment dependencies with graph convolutional networks for aspect-level sentiment classification. Knowl. Systems 2020, 193, 1–10. [Google Scholar] [CrossRef] [Green Version]
Wan, H.; Yang, Y.F.; Du, J.F.; Liu, Y.A.; Qi, K.X.; Pan, J.Z. Target-aspect-sentiment joint detection for aspect-based sentiment analysis. Proc. AAAI Conf. Artif. Intell. 2020, 34, 9122–9129. [Google Scholar] [CrossRef]
Schmitt, M.; Steinheber, S.; Schreiber, K.; Roth, B. Joint aspect and polarity classification for aspect-based sentiment analysis with end-to-end neural networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 1109–1114. [Google Scholar]
Wang, Y.; Huang, M.; Zhu, X. Attention-based lstm for aspect-level sentiment classification. Empir. Methods Nat. Lang. Process. 2016, 1, 606–615. [Google Scholar]
Ma, D.; Li, S.; Zhang, X.; Wang, H. Interactive attention networks for aspect-level sentiment classification. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 4068–4074. [Google Scholar]
Zhang, X.; Gao, T. Multi-head attention model for aspect level sentiment analysis. J. Intell. Fuzzy Systems 2020, 38, 89–96. [Google Scholar] [CrossRef]
Liu, P.; Liu, T.; Shi, J. Aspect level sentiment classification with unbiased attention and target enhanced representations. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, Virtual, 30 March–3 April 2020; pp. 843–850. [Google Scholar]
Zhang, C.; Li, Q.C.; Song, D.W. Aspect-based sentiment classification with aspect-specific graph convolutional networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Volume 2, pp. 4560–4570. [Google Scholar]
Sun, K.; Zhang, R.C.; Samuel, M.; Mao, Y.Y.; Liu, X.D. Aspect-level sentiment analysis via convolution over dependency tree. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5679–5688. [Google Scholar]
Huang, B.X.; Kathleen, M.C. Syntactic-aware aspect level sentiment classification with graph attention networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5472–5480. [Google Scholar]
Wang, X.; Zhu, M.Q.; Bo, D.; Cui, P.; Shi, C.; Pei, J. Am-gcn: Adaptive multichannel graph convolutional networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; pp. 1243–1253. [Google Scholar]
Zheng, Y.W.; Zhang, R.C.; Samuel, M.; Mao, Y.Y. Replicate, walk, and stop on syntactic: An effective neural network model for aspectlevel sentiment classification. Proc. AAAI Conf. Artif. Intell. 2020, 34, 9685–9692. [Google Scholar]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. ICLR 2015, 7, 1–15. [Google Scholar]
Baccianella, S.; Esuli, A.; Sebastiani, F. Senti-WordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta, 17–23 May 2010; pp. 83–90. [Google Scholar]
Ku, L.W.; Chen, H.H. Mining opinions from the Web: Beyond relevance retrieval. J. Am. Inf. Sci. Technol. 2007, 58, 1838–1850. [Google Scholar] [CrossRef]
Hownet Sentiment Dictionary. Available online: https://cidian.cnki.net (accessed on 22 October 2007).
Nguyen, T.H.; Shirai, K. Aspect-based sentiment anslysis using tree kernel based relation. In Lecture Notes in Computer Science, Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics, Cairo, Egypt, 14–20, April 2015; Springer: Cham, Switzerland, 2015; pp. 114–125. [Google Scholar]
Ding, X.; Liu, B.; Yu, P. A holist lexicon-based approach to opinion mining. In Proceedings of the 2008 International Conference on Web Search and Data Mining, Palo Alto, CA, USA, 11–12 February 2008; pp. 231–240. [Google Scholar]
Pang, B.; Lee, L.; Shivakumar, V. Thumbs up? Sentiment classification usingmachine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Professing (EMNLP), Philadelphia, PA, USA, 6–7 July 2002; pp. 79–86. [Google Scholar]
Waila, P.; Marisha; Singh, V.K.; Singh, M.K. Evaluating Machine Learning and Unsupervised Semantic Orientation approaches for sentiment analysis of textual reviews. In Proceedings of the 2012 IEEE International Conference on Computational Intelligence and Computing Research, Coimbatore, India, 18–20 December 2012; pp. 1–6. [Google Scholar]
Gang, Z.; Zan, X. Research on sentiment analysis model of commodity reviews based on machine learning. Inf. Secur. Res. 2017, 2, 166–170. [Google Scholar]
Chen, Y.H. Convolutional Neural Networks for Sentence Classification. Master’s Thesis, University of Waterloo, Waterloo, ON, Canada, 2015. [Google Scholar]
Zhang, Y.; Jiang, Y.; Tong, Y. Study of sentiment classification for Chinese microblog based on recurrent neural network. Chin. Jourmal Electron. 2016, 4, 601–607. [Google Scholar] [CrossRef]
Nowak, J.; Taspinar, A.; Scherer, R. LSTM recurrent neural networks for short text and sentiment classification. In Lecture Notes in Computer Science, Proceedings of the International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, 11–15 June 2017; Springer: Cham, Switzerland, 2017; pp. 553–562. [Google Scholar]
Wang, Y.Q.; Huang, M.; Zhu, X.Y.; Zhao, L. Attention-based lstm for aspect-level sentiment classification. In Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, TX, USA, 1–5 November 2016; pp. 606–615. [Google Scholar]
Song, Y.; Wang, J.H.; Tao, J.; Liu, Z.Y.; Rao, Y. Attentional encoder network for targeted sentiment classification. arXiv 2019, arXiv:1902.09314. [Google Scholar]
Chen, P.; Sun, Z.Q.; Bing, L.D.; Yang, W. Recurrent attention network on memory for aspect sentiment analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 452–461. [Google Scholar]
Li, C.; Guo, X.; Mei, Q. Deep Memory Networks for Attitude Identifification. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, Cambridge, UK, 6–10 February 2017; pp. 671–680. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-Supervised Classifification with Graph Convolutional Networks. In Proceedings of the 2017 International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017; pp. 24–26. [Google Scholar]
Wang, K.; Shen, W.; Yang, Y.Y.; Quan, X.J.; Wang, R. Relational Graph Attention Network for Aspect-based Sentiment Analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 3229–3238. [Google Scholar]
Huang, B.X.; Carley, K.M. Parameterized convolutional neural net-works for aspect level sentiment classification. arXiv 2019, arXiv:1909.06276. [Google Scholar]
Li, R.F.; Chen, H.; Feng, F.X.; Ma, Z.Y.; Wang, X.J.; Eduard, H. Dual Graph Convolutional Networks for Aspect-based Sentiment Analysis. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Period, Online, 1–6 August 2021; pp. 6319–6329. [Google Scholar]
Pennington, J.; Socher, R.; Manning, C. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
Ashish, V.; Noam, S.; Niki, P.; Jakob, U.; Llion, J.; Aidan, N.G.; Łukasz, K.; Illia, P. Attention is all you need. Adv. Neural Inf. Process. Syst. (NIPS) 2017, 30, 5998–6008. [Google Scholar]
Liina, P. The neural basis of combinatory syntactic and semantics. Science 2019, 366, 62–66. [Google Scholar]
Li, D.; Wei, F.; Tan, C.Q.; Tang, D.Y.; Zhou, M.; Xu, K. Adaptive recursive neural network for target-dependent Twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MI, USA, 23–24 June 2014; pp. 49–54. [Google Scholar]
Maria, P.; Dimitris, G.; John, P.; Harris, P.; Ion, A.; Suresh, M. SemEval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, 23–24 August 2014; pp. 27–35. [Google Scholar]
Huang, B.; Ou, Y.; Carley, K.M. Aspect level sentiment classification with attention-over-attention neural networks. In Lecture Notes in Computer Science, Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation, Washington, DC, USA, 10–13 July 2018; Springer: Cham, Switzerland, 2018; pp. 197–206. [Google Scholar]
Fan, F.F.; Feng, Y.S.; Zhao, D.Y. Multi-grained attention network for aspect-level sentiment classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 3433–3442. [Google Scholar]
Zhang, M.; Qian, T.Y. Convolution over hierarchical syntactic and lexical graphs for aspect level sentiment analysis. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–18 November 2020; pp. 3540–3549. [Google Scholar]

Figure 1. RAG-TCGCN model structure.

Table 1. Comparison of different sentiment analysis methods.

Method	Characteristic	Representative	Comparison Results
sentiment dictionary	According to the sentiment polarity of sentiment words provided by different sentiment dictionaries, sentiment polarity can be divided at different granularity.	SentiWordNet, NTUSD, How Net, et al.	(1) Performance: deep learning is more accurate than traditional machine learning for sentiment classification. (2) Time and hardware: traditional machine learning requires less time and hardware than deep learning training.
machine learning	Generally, sentiment polarity is divided through two stages: feature extraction and classification algorithm design.	SVM, NBM, LR, et al.
deep learning	Simple neural network, attention-based neural network, and graph convolution network are mainly used to divide the sentiment polarity.	CNN, LSTM, GCN, et al.

Table 2. Statistics of datasets.

Dataset	Division	Positive	Negative	Neutral
Rest14	Training	2164	807	637
Rest14	Testing	728	196	196
Lap14	Training	994	851	455
Lap14	Testing	341	128	167
Twitter	Training	1507	1528	3016
Twitter	Testing	173	169	336

Table 3. Experimental environment.

System	Windows 10
CPU	Intel(R) Core(TM) i7-10510u
GPU	NVIDIA GeForce MX250
Language	Python 3.8
Tool	Pycharm 2021

Table 4. Experimental parameters.

Parameter Settings	Value
embed_dim	300
batch_size	16
rnn_hidden	50
input_dropout	0.7
gcn_dropout	0.1
num_epoch	50
learning_rate	0.002

Table 5. Experimental results comparison on three publicly available datasets.

Models	Restaurant		Laptop		Twitter
Models	Accuracy	Macro-F1	Accuracy	Macro-F1	Accuracy	Macro-F1
IAN	78.60	-	72.10	-	-	-
AOA	80.53	69.84	72.88	67.48	72.25	69.96
RAM	80.23	70.80	74.49	71.35	69.36	67.30
MGAN	81.25	71.94	75.39	72.47	72.54	70.81
ASGCN-DG	80.77	72.02	75.55	71.05	72.15	70.40
ASGCN-DT	80.86	72.19	74.14	69.24	71.53	69.68
CDT	82.30	74.02	77.19	72.99	74.66	73.66
BiGCN	81.97	73.48	74.59	71.84	74.16	73.35
R-GAT	83.30	76.08	77.42	73.76	75.57	73.82
DGEDT	83.90	75.10	76.80	72.30	74.80	73.40
DualGCN	84.27	78.08	78.48	74.74	75.92	74.29
RAG-TCGCN	84.09	77.02	78.80	75.04	76.66	75.41

The symbol ‘-’ indicates this result is not available in their work. The best are in bold.

Table 6. Experimental results of ablation study.

Models	Restaurant		Laptop		Twitter
Models	Accuracy	Macro-F1	Accuracy	Macro-F1	Accuracy	Macro-F1
SynGCN	82.57	75.06	76.90	72.60	74.59	73.13
SemGCN	82.48	73.12	76.42	72.19	75.18	73.87
Comom	81.50	71.99	77.22	73.14	74.45	73.39
TCGCN	82.84	74.70	77.85	73.97	75.78	74.59
RAG-TCGCN	84.09	77.02	78.80	75.04	76.66	75.41

Table 7. Visual analysis cases of IAN, SynGCN, SemGCN, Comom, and RAG-TCGCN.

Moedel	Aspect	Attention Visualization	Prediction	Label
IAN	food	Great food but terrible service	Neutral	Positive
	dinner	My wife and I recently visited the bistro for dinner and have a wonderfull experience	Positive	Neutral
	Windows11	Did not enjoy the new Windows11 and touchscreen functions	Neutral	Negative
SynGCN	food	Great food but terrible service	Positive	Positive
	dinner	My wife and I recently visited the bistro for dinner and have a wonderfull experience	Positive	Neutral
	Windows11	Did not enjoy the new Windows11 and touchscreen functions	Positive	Negative
SemGCN	food	Great food but terrible service	Positive	Positive
	dinner	My wife and I recently visited the bistro for dinner and have a wonderfull experience	Positive	Neutral
	Windows11	Did not enjoy the new Windows11 and touchscreen functions	Negative	Negative
Comom	food	Great food but terrible service	Positive	Positive
	dinner	My wife and I recently visited the bistro for dinner and have a wonderfull experience	Positive	Neutral
	Windows11	Did not enjoy the new Windows11 and touchscreen functions	Negative	Negative
RAG-TCGCN	food	Great food but terrible service	Positive	Positive
	dinner	My wife and I recently visited the bistro for dinner and have a wonderfull experience	Neutral	Neutral
	Windows11	Did not enjoy the new Windows11 and touchscreen functions	Negative	Negative

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, H.; Liu, S.; Wang, W.; Deng, L. RAG-TCGCN: Aspect Sentiment Analysis Based on Residual Attention Gating and Three-Channel Graph Convolutional Networks. Appl. Sci. 2022, 12, 12108. https://doi.org/10.3390/app122312108

AMA Style

Xu H, Liu S, Wang W, Deng L. RAG-TCGCN: Aspect Sentiment Analysis Based on Residual Attention Gating and Three-Channel Graph Convolutional Networks. Applied Sciences. 2022; 12(23):12108. https://doi.org/10.3390/app122312108

Chicago/Turabian Style

Xu, Huan, Shuxian Liu, Wei Wang, and Le Deng. 2022. "RAG-TCGCN: Aspect Sentiment Analysis Based on Residual Attention Gating and Three-Channel Graph Convolutional Networks" Applied Sciences 12, no. 23: 12108. https://doi.org/10.3390/app122312108

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

RAG-TCGCN: Aspect Sentiment Analysis Based on Residual Attention Gating and Three-Channel Graph Convolutional Networks

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Input Layer

3.2. Syntactic Graph Convolution Moduler

3.3. Semantic Graph Convolution Module

3.4. Common Graph Convolution Module

3.5. Attention Module

3.5.1. Multi-Head Self-Attention

3.5.2. Residual Attention Gating

3.6. Output Layer

4. Experiment

4.1. Datasets

4.2. Dataset Implementation and Parameter Settings

4.3. Baseline Methods

4.4. Comparative Results and Analysis

4.5. Ablation Study

4.6. Case Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI