1. Introduction
Sentiment analysis refers to the study of people’s opinions, emotions, and attitudes towards services, events, products, topics, and their attributes. With the rapid development of social media in today’s world, sentiment analysis has also become a basic task in the field of natural language processing [
1]. Sentiment analysis can be divided into three types: text-level sentiment classification, sentence-level sentiment classification, and aspect-based sentiment analysis (ABSA), according to the research granularity of the text. The ability to more accurately judge the emotional polarity of different emotional subjects in a sentence has become an important research direction in the field of sentiment analysis [
2]. For example: “The waiters are very friendly, and the pasta is simply average”; there are two aspect words, “waiters” and “pasta”, in this comment, the emotional polarities are positive and neutral, respectively, and the corresponding opinion words are “friendly” and “simply average”. Aspect-level sentiment classification is very promising in various industries. For example, in the field of catering services, restaurants can quickly and deeply analyze users’ consumption habits and personal preferences by analyzing users’ product reviews [
3] and it is quick for understanding the public’s attitude towards a policy or opinion on a hot topic [
4].
In some early works, researchers used artificially designed feature sets to train Support Vector Machine (SVM) classifiers to solve aspect-level sentiment analysis tasks [
5,
6,
7]. Later, neural network-based methods have been widely used in this task due to their flexible structures and automatic feature extraction capabilities. Most neural network models are based on long short term memory (LSTM) [
8,
9]. However, LSTM extracts features from left-to-right or right-to-left in the text, which is not a true bidirectional text feature extraction method. To focus on the parts of a particular aspect in a sentence, many attention-based models have emerged [
10,
11,
12,
13], and various BERT-based language model [
14] have achieved a state-of-the-art performance in many NLP tasks. Masked language models can incorporate context, making BERT a pre-trained deep bidirectional transformer. Nevertheless, in most models, the aspect word and context word vectors produced by the intermediate hidden layers are represented as an explicit representation of the context [
15] and are further used to compute the aspect score. However, it is difficult for BERT to describe the linguistic knowledge in the text, and it will lose much potential emotional information.
Based on these observations, this paper proposes a model sentiment analysis based on BERT and knowledge graph (SABKG). First, the model incorporates part-of-speech information into the output representation of BERT, obtains text semantic information through linguistic knowledge, enhances the text representation, and better identifies aspect objects through BERT. Then, this model constructs the triplet of “aspect word, sentiment polarity, sentiment word” and constructs the knowledge graph of the aspect word and sentiment word, where the aspect word and sentiment word are nodes, and the sentiment polarity is the edge. The embedding of nodes and edges in triples is learned using RGCN as an input to TransH. The framework proposed in this paper can enrich the contextual relationship between aspect words and sentiment words in the text, strengthen the representation of aspect words and sentiment words, and predict sentiment polarity.
This paper evaluates the proposed aspect-based sentiment analysis model SABKG on three public benchmark datasets. The experimental results show that the model performs better than other previous models. The main contributions of this paper are as follows:
	  
- Adds part-of-speech features to the input of BERT. Specifically, the part-of-speech feature obtained by w2v is added to the input of BERT to learn the part-of-speech of subject words and sentiment words. This improves the model’s ability to understand linguistic knowledge to a certain extent. 
- Uses a combination of word vectors, position vectors, and part-of-speech vectors to construct triples of “aspect words, sentiment polarity, sentiment words” through BERT. Enriches part-of-speech features for entity recognition and part-of-speech tagging to prepare for downstream learning. 
- Takes the output of the triplet of bert+crf as the input of RGCN. The embedding representation of the triplet of aspect word, sentiment word, and sentiment polarity is learned through the graph neural network. The three embeddings are input into TransH to strengthen the representation of different sentiment polarity words. 
The rest of this article is organized as follows. Related methods for aspect-based sentiment analysis are reviewed in 
Section 2, and their advantages and disadvantages are discussed. The SABKG model structure is introduced in detail in 
Section 3. 
Section 4 verifies the model’s effectiveness experimentally and compares its performance with other methods. 
Section 5 summarizes the work of this paper.
  2. Related Work
ABSA aims to identify the emotional polarity of an aspect of a given text. Most early works constructed feature sets manually labeling features, including aspect feature set, dictionary feature set, and parsing feature set, to train the SVM for sentiment analysis [
5,
6,
7]. However, such methods make it too challenging to create feature sets due to the immense workload of manual labeling. With the development of the neural network, more and more neural network methods are being used to solve this task. The LSTM model can learn from long-term dependencies due to its strong ability to forget, memorize and update information. At the same time, it can also solve the problem of vanishing gradient [
9,
10,
11,
12,
13,
14,
15,
16]. ME Basiri et al. [
17] proposed a bidirectional CNN-RNN model based on attention, which obtains context information from time through bidirectional GRU and LSTM layers. X Li et al. [
18] used the two-layer LSTM model to perform the ABSA task and then spliced the last hidden state of the two LSTMs as the classification feature. Finally, the emotional features of the context were integrated into the aspect representation. However, the one-way feature extraction method of LSTM cannot truly obtain global parts.
Some researchers use external knowledge to enhance semantic representation to improve the model’s performance. Ma et al. [
8] added the exterior features of SenticNet to the original structure of LSTM to strengthen the model’s ability for sentiment analysis. To enhance the sentiment polarity prediction ability of LSTM for input text, Teng et al. [
19] performed manual polarity classification on sentiment words in each sentence. Tay et al. [
20] utilized attention-gated units of learning their embeddings by combining sentiment lexicon and sentence information. A humanlike approach that takes sentiment grammar knowledge as prior knowledge was proposed for aspect-level sentiment classification [
21]. Ghosal et al. [
22] proposed a domain-adaptive model that leverages external standard sense-related information to improve sentiment analysis performance. Chen et al. [
23] designed a sentiment analysis framework, KNEE, which combines sentiment classification and the recognition of aspect-sentiment pairs into one text classification task. Mireille Fares et al. introduced an unsupervised word-level sentiment analysis framework that computes and propagates sentiment scores in lexical sentiment maps [
24]. Cambria et al. [
25] built a new sentiment analysis knowledge base with integrated applications of symbolic artificial intelligence. A recent trend in graph embedding techniques is to consider embedding whole networks rather than individual nodes [
26], which can also be used for sentiment analysis. Existing sentiment lexical methods have similar motivations, but struggle to describe aspect–sentiment relationships for sentiment recognition and classification.
With the continuous development of language models, the text pre-training model BERT proposed by Devlin et al. [
27] performs well in text classification tasks. It uses a two-way transformer to build an overall network and learns the context information on the left and right sides. At the same time, in the pre-training process, text vector features can be learned from shallow to deep. Compared with RNN and LSTM, BERT can be performed concurrently and extract information at different levels, reflecting more comprehensive sentence semantics. Sun et al. [
28] used aspects to construct interrogative sentences and combined them with the BERT model to propose models such as BERT-pair-QA-M. The experimental results show that the classification results of the short text ABSA data set are better than the classification results using the neural network algorithm. Gao et al. [
29] proposed the TD-BERT model based on BERT and further improved the classification performance by extracting the features of specific positions of the BERT coding layer for aspect-level sentiment classification, indicating that BERT has a superior performance in feature extraction. However, previous research based on BERT did not solve the problem that BERT finds it challenging to understand a large amount of linguistic knowledge in the text, but simply extracts text features, which will lose much potential emotional information.
In this paper, we combined BERT and RGCN to enable our model to exploit the syntactic structure information of sentences. Due to the complexity of the association between text contents, we constructed the “aspect word, sentiment polarity, sentiment word” triplet and used the RGCN to learn the embeddings in the triplet and build the knowledge graph of the aspect word and sentiment word. The context relationship between subject words and emotion words was further extracted to improve the accuracy of emotion polarity prediction.
  3. Proposed Method
This paper adopted a two-stage model design, and the overall design of the model is shown in 
Figure 1. The model in the first stage extracts the target subject words and sentiment words that may describe the target subject words in all reviews through the BERT and CRF modules, as well as the aspect term polarity of the target words. First, the processed text is encoded by the BERT encoder to obtain the corresponding word vector 
, and the part of speech of the word is input into w2v to obtain the embedded 
 of the part of speech. Their mean value is taken as the input of CRF. The second stage is to construct the “aspect word, sentiment polarity, sentiment word” triplet and use RGCN to construct the embedding of aspect, sentiment polarity, and sentiment word. Among them, rel_i represents heterogeneous graphs of different triples and is used as the input of RGCN. ReUL is an activation function. Finally, TransH is used to process the output of RGCN, so as to improve the accuracy of predicting the emotional polarity corresponding to the aspect words.
  3.1. BERT-CRF Aspect and Sentiment Word Recognition
The first layer of the model is to use the pre-trained BERT language model to initialize the word vector of the input text information as sequence , which can effectively extract text by using the relationship between words. The second layer of the model is to use the average value of the word vector and part-of-speech vector generated by BERT as the input of the CRF, use the Language Technology Platform (LTP) tool to segment the input text, and then use the BIO template to mark the word segmentation result to indicate the phrase boundary. If a word is the starting word of a sentence, it is marked with a “B” label; if it is the next word of the sentence, it is marked as “I”; if it is not a phrase word, it is marked as “O”. Similarly, after obtaining the word segmentation label, this paper used word embedding technology to map the word segmentation label to the vector space, so that CRF can be used to segment and label sequence data and predict the corresponding state sequence according to the input sequence while considering the current state features of the input and transfer features for each label category. CRF is mainly based on the prediction output sequence of the BERT model to find the sequence that optimizes the objective function.
  3.1.1. Entity Recognition
For a user comment text, it may contain both the user’s positive comments and the user’s negative comments. For example, “Waiters are very friendly and the paste is simply average” mentions two comment subjects: “waiters” and “pasta”. Their sentiment polarities are “Positive” and “Negative”, respectively. At the same time, we can know that the main words representing their emotions are “friendly” and “average”. Therefore, we need to first extract the aspect words and sentiment words that represent the reasons for sentiment polarity in the text.
First, use the BIO template to label the text sequence. The sequence labeling can construct a label for each word in this article: B-begin, representing the beginning of the entity I-inside, representing the middle or the end of the entity O-outside, representing not belonging to the entity. For example, “John Smith lives in New York” could be labeled “B-name, I-name, O, O, B-name, I-name”. Among them, B-name and I-name are the subjects; that is, John Smith and New York are the subjects.
The word vector 
 of the processed text is recorded as the sequence as the input of BERT, and the part-of-speech sequence 
 of the words in the text is encoded by word2vec through 
Section 3.1.2. Vector h is the word vector of the sentence processed by BERT. The new vector is used as the input of the CRF module after fusing the vectors w and h and taking the mean.
          
          The hidden context vector 
 can be directly used as a feature so that each output 
 of the label sequence 
 makes an independent labeling decision. Combined with the state transition matrix in CRF, an optimal global sequence is obtained according to the adjacent labels, and the score of the label sequence can be expressed as:
		  
          Among them, 
Z is the transition matrix; 
 is the score of the label transfer from 
 to 
; 
 is the score of the 
i+1th word in the input sequence corresponding to the label 
. The probability of the label sequence 
y is caluclated, which can be expressed as:
		  
          where 
 is the set of all possible label sequences and the label of the final output sequence is the label set with the highest probability.
  3.1.2. Sentiment Word Recognition
To identify the sentiment words that describe the target aspect, the Natural Language Toolkit (NLTK) is used to identify the part of speech of the text. Considering that the words describing the target aspect are generally adjectives, adverbs, interjections, and other types, this paper extracted adjectives, adjective comparatives, adjective superlatives, etc., from the text. Let 
Ysent_set = {
adj, 
adv, 
adj_com, 
adj_sup, 
adv_com, 
adv_sup} be an ordered set of sentiment word types. Through the target subject identified in 
Section 3.1.1, the aspect word i in the short sentence with a radius r = eight is scanned from the center to both sides. If the first word 
 is scanned, then 
 is the emotion corresponding to the aspect word i. Multiple subject words can be queried according to the same rules. See Algorithm 1 for details of the algorithm.
          
| Algorithm 1. Recognition of aspect words | 
| Input: Aspect word i Output: Sentiment word y
 1: for elem in S do
 2:   %S is sentence set, elem is sentence
 3:   asp_set = get_asp(elem)
 4:   for i in asp_set do
 5:    for k in range(1, r + 1) do
 6:     if y = get_pob(i + k) and in_Ysent == Ture then
 7:      %get_pob(i + k) means to obtain the part of speech of the kth word from the right side of the aspect word i
 8:      y is the sentiment word corresponding to i
 9:     else if y = get_pob(i-k) and in_Ysent == Ture then
 10:      y is the sentiment word corresponding to i
 11:     else
 12:      no sentiment words
 13:     end if
 14:    end for
 15:   end for
 16: end for
 | 
  3.2. Building a Knowledge Graph with RGCN
  3.2.1. Construct a Heterogeneous Graph of “Aspect Word, Sentiment Polarity, Sentiment Word”
Given a sentence , the aspect item extraction model first extracts a set of aspects . For each extracted aspect , the aspect-oriented sentiment word extraction model extracts its sentiment word , where  is the number of sentiments about the jth aspect and . Finally, for each extracted aspect–opinion pair , its sentiment  positive, neutral, negative } is predicted by the aspect–opinion pair sentiment classification model. Triples are obtained by combining the results of the three models: . The prediction module is integrated into the whole model as a sub-task.
  3.2.2. Construction of Knowledge Graph
According to RGCN, a heterogeneous graph of “aspect word, sentiment polarity, sentiment word” triples is constructed, as shown in 
Figure 2, taking “Waiters are very friendly, and the pasta is simply average, it tastes is just right” as an example, where nodes represent aspect and sentiment words, and edges and relationships represent sentiment polarity (positive, negative, neutral). The heterogeneous graph is used to identify the emotional relationship in the sentence and as the input of RGCN.
The triplet obtained after the previous step is input into the RGCN model, where the aspect word and the sentiment word are nodes, and the sentiment polarity is the edge. The propagation model is as follows:
		  
          where 
 represents node 
 of hidden layer 
I; 
W represents the set of neighbor nodes of node 
i under relation 
r; 
 is a normalized constant.
In order to prevent the model overfitting, this paper uses the following methods to decompose the training matrix to achieve parameter sharing between different relation weights:
		  
Since what is obtained is not a complete set of edges 
E, but an incomplete subset 
, in order to assign a score 
f(
a, r, s) to the possible edges 
, to determine how likely it is that these edges belong to 
E. This paper uses the DistMult factorization as the scoring function. In DistMult, each relation is associated with a diagonal matrix 
, and a triple 
 is scored as:
		  
          where 
 is the real-valued vector to which the encoder maps each aspect entity 
.
In order to balance positive and negative samples, we need to construct 
ω negative samples for each positive sample (existing edge). The construction of this negative sample can be connected between completely unrelated nodes, or it can be used to replace an entity 
a or 
s in a pair of relationships 
 as another entity object with high frequency. This paper uses the cross-entropy loss function for optimization:
		  
          where 
T is the total set of true and corrupt triples, 
l is the logical sigmoid function, and 
y is an indicator, 
y = 1 for positive triples and 
y! = 1 for negative triples.
  3.2.3. TransH Output Layer
The embedding of nodes and edges obtained by RGCN in 
Section 3.2.2 is used as the input to TransH. Considering that the TransH model can model one-to-many, many-to-one, and many-to-many relationship modes in the graph, this paper used the TransH model to model the graph. As the semantic information of triples and for triples is 
, this paper used the vector representation output of the graph convolutional network as the input of TransH, where the distance function of TransH is defined as:
		  
          where 
 represents the translation vector of emotion polarity 
r on the relationship plane, 
 and 
 are the projection vectors of aspect word vector 
, and sentiment word vector 
 is on the relationship plane:
		  
          where 
 is the normal vector on the relation plane.
  3.3. Training Target
The node representation of aspect words and sentiment words after TransH encoding can be represented as 
 and 
 and, finally, the inner product of the target aspect word and the target sentiment is calculated to predict the matching score of aspect and sentiment:
		
This paper chose the BPR loss function as the training target for emotion. Specifically, the BPR loss function believes that the observed interaction should achieve higher scores than the unobserved interaction; that is, the positive aspect word–sentiment word sample is expected to achieve a higher score than the negative aspect word–sentiment word sample. Therefore, during training, the optimized objective function is:
		
        where 
O represents the training set, and (asp, sent) represents the positive emotion of the aspect words and sentiment words. (asp, sent′) indicates that the aspect words and the sentiment words are not positive emotions, that is, there is no correlation between the aspect word asp and the sentiment word sent. 
y(asp, sent) and 
y(asp, sent′) represent the scores of positive and negative cases, respectively. When there is a neutral or negative relationship between the subject word and the emotion word, this formula can also be used to express it.
For the TransH model, this paper set the loss function as:
		
        where 
 is the loss function based on margin, 
 is the distance function of positive triples, 
 is the distance function of negative triples, and 
T is all set of triples.
The final loss function 
L is the set of 
 and 
:
		
        where 
 and 
 are the weights of loss functions 
 and 
, respectively, and 
 + 
 = 1.
  4. Experiments and Results Discussion
  4.1. Dataset and Experiment Setup
To verify the model’s effectiveness, this paper selected four widely used public datasets to evaluate the model proposed in this study; they are Restaurant14 and Laptop14 in SemEval-2014 Task4 and Restaurant15 in SemEval-2015. They were labeled with three sentiment labels: positive, neutral, and negative, and the specific statistics are shown in 
Table 1. Among them, long text (sentence length greater than 40) in data set LAP14 is the largest, accounting for 19.48%. This means that the performance of the model on LAP14 will decrease compared with Res14 and Res15. However, it can better reflect the performance of the model in dealing with long and difficult sentences. At the same time, the data of short text (sentence length of less than 20) in Res14 and Res15 are the largest, accounting for 57.69% and 68.96% of the total, respectively.
In the experiments of this paper, the number of transformer layers L = 12, and the hidden size  was 768. The learning rate was 2 × 10−5. The batch size was set as 25 for Lap14 and 16 for Rest 14 and Rest 15. We trained the model up to 1500 steps. After training 1000 steps, we conducted model selection on the development set for every 100 steps according to the micro-averaged F1 score. The experimental platform of this paper was pycharm, and the python version was python 3.6.
  4.2. Model Comparisons
This article compares the following methods:
		
- CMLA [ 30- ]-ALSTM [ 31- ]: CMLA is good at aspect extraction tasks. ALSTM performs well in aspect feature classification tasks. Combining CMLA-ALSTM can be used for ABSA tasks. 
- AHGCN [ 32- ]: It adds heterogeneous diagrams of multiple relationships to GCN and uses them for ABSA tasks. 
- MNN [ 33- ]: It designs a marking scheme to integrate ABSA tasks. 
- INABSA [ 18- ]: It utilizes a unified labeling scheme to integrate the two subtasks of ABSA. 
- DREGCN [ 34- ]: It adds a messaging mechanism to the GCN to enhance the relationship representation. 
- BERT + GRU [ 35- ]: It combines BERT and GRU for end-to-end ABSA tasks. 
- DOER [ 36- ]: The model uses a double-cross shared RNN unit for feature learning. 
- R-GAT [ 37- ]: This method encodes the tree structure based on aspects for sentiment prediction based on a graph attention network. 
- MTMVN [ 38- ]: It is a multi-view learning model that provides new ideas for ABSA tasks. 
- IACapsNet [ 39- ]: This model uses an EM routing algorithm to cluster sentiment features and uses a capsule network to model features. 
- RACL [ 40- ]: It proposes a relation-aware collaborative learning network that combines aspect–sentiment term extraction and aspect–sentiment classification tasks. 
Table 2 and 
Figure 3 show the comparison results between our method and the above method. First, our model outperforms the two pipeline methods. Compared with DECNN-dTrans, it achieves absolute gains of 18.82%, 13.95%, and 15.53% in the F1-score on Notebook, Rset14, and Rest15 data sets. At the same time, it can be seen from the results of comparative experiments that most end-to-end framework models outperform traditional pipeline methods, which broadens a new track for the future development of ABSA tasks. At the same time, compared with R-GAT, the accuracy of the method in this paper on the Rest14 dataset is not improved. This is because the method in this paper integrates the grammatical information of sentences, and there is a large number of ungrammatical sentences in the Rest14 dataset, which affects the performance of the model.
 Our method outperforms the DOER model using the sentiment vocabulary on both evaluation metrics. DOER combines aspect sentiment classification and aspect term extraction to obtain the final end-to-end labels, but it cannot flexibly handle various cases of labeling results. The latter three models use the sentiment word tagging method to perform the aspect word extraction, sentiment word extraction, and aspect sentiment classification tasks. In the absence of comment word annotation, our method achieves better results.
On three benchmark datasets, the proposed SABKG in this paper also achieves 20.58%, 13.95%, and 13.75% improvement over the F1 scores of MNNs on the three benchmark datasets compared to other end-to-end methods. MNN is an end-to-end most typical multi-task ABSA algorithm. However, since it performs end-to-end ABSA under a unified labeling scheme, the interaction between the two tasks of aspect extraction (AE) and aspect–sentiment classification (ASC) is not considered. Meanwhile, although INABSA adds the aspect word extraction task for unified labeling, it ignores vital sentiment information. Therefore, the method in this paper outperforms the MNN model and the INABSA model, which shows that adding RGCN’s ability to encode unstructured data into the model can effectively extract the linguistic knowledge hidden in the text. Therefore, the ability to combine the two sub-tasks of AE and ASC is further improved.
  4.3. Comparison with BERT-BASED Model
This paper also compared the model based on the BERT encoder to verify the effectiveness and universality of the model. The experimental results are shown in 
Table 3 and 
Figure 4.
BERT-GRU, GBM-BERT, R-GAT-BERT and RACL-BERT are all based on BERT encoder models. BERT uses a pre-trained model to perform ABSA tasks, and its unique feature extraction layer makes it better than non-BERT models in terms of performance. However, their average performance on the two data sets Lap14 and Res14 is still lower than that of the SABKG model proposed in this paper. In the SABKG model, we added the part-of-speech vector to the model to extract the hidden information in the text, thus avoiding the interference of other words. This shows that our strategy of using a knowledge map to extract aspect word embedding information is effective. GBM-BERT combines GBM and BERT and uses a novel gating mechanism to perform ABSA tasks. This unique gating mechanism can filter out useless information and improve model performance. Its performance is better than that of BERT-GRU. However, because it cannot recognize the semantic information of text, the performance of the BERT encoder is still not improved. RACL-BERT combines the idea of multi-task learning, which significantly improves the performance of the traditional baseline model on two data sets. R-GAT-BERT uses the graph attention network to perform the ABSA task, and its performance has been significantly improved after combining with the BERT encoder. On data set Rest14, R-GAT-BERT is 3.53% higher than SABGK in F1 value. This is because it will rely on the tree structure to focus on the aspects of the target. However, because there are many long and difficult sentences in the Lap14 dataset, R-GAT-BERT cannot focus on the correct aspects when processing these sentences. SABKG can capture local and global context information, and has a strong performance. In the last layer of the BERT encoder, the context implicit vector and aspect implicit vector are simultaneously extracted, and the part-of-speech vector is added between the BERT and CRF inputs, which enhances the linguistic knowledge representation. The interactive features of context and aspect words were further extracted using the RGCN network. Using the Laptop 14 and Restaurant 14 datasets, the domain adaptation of the BERT pre-training and fine-tuning stages is improved to achieve the best results.
  4.4. Ablation Experiment
In this section, a series of variants was designed to verify the effectiveness of the proposed components. The experimental results are shown in 
Table 4 and 
Figure 5; one module or policy was deleted at a time.
First, this paper removed the TransH module from the model. The F1 scores drop significantly on all three data sets, suggesting that Trans H provides useful aspect information for detecting aspect boundaries, enhancing the edge learning of the sentiment polarity of aspect words. Likewise, the Acc score drops when CRF is removed. The CRF module acquires basic emotional knowledge by detecting sentiment words. TransH and CRF are distinct but essential, setting the stage for subsequent work.
After removing the RGCN module, the F1 score drops on all datasets. If the RGCN module is removed, the relationship between aspects, sentiment words, and sentiment polarity triples extracted from the BERT module cannot be parsed. When the sentiment words are far from the aspect words, the opinion features of the sentiment words are lost. The RGCN module can better capture the connection between aspect and sentiment words and learn the inner relationship between them.
To further verify the role of RGCN in ABSA, this paper first removed the node from the knowledge graph (the edges connected to it also disappear) to observe the effect of the model. Then, this paper conducted the same experiments on a knowledge graph with deleted nodes. As seen from 
Figure 4, removing the nodes does not make the model work well, and without the help of the nodes, the experimental results are not good. On the two data sets, the Acc of the model decreased by 3.95% and 5.47%, and the F1 decreased by 14.71% and 8.33%, respectively, indicating that the contribution of the RGCN module to the model in this paper is very significant. As seen from the table, the Acc and F1 metrics are decreasing on all three datasets, which indicates that removing nodes has a more significant impact on the model than removing CRF and TransH. One possible reason is that nodes learn the polar representations of aspect and sentiment words and then propagate the representation information to nodes and integrate it into the sub-predicted module. Therefore, the knowledge graph constructed by RGCN can improve the emotional polarity prediction task of aspect words and emotion words. This reflects the necessity of the graph neural network learning of the nodes and edges of the aspect word–sentiment word.
  5. Conclusions
This paper proposes an SABKG model for improving the ABSA task, and two essential conclusions can be drawn from the analysis. First, this paper combines the aspect extraction task and the aspect sentiment classification task to complete the ABSA task through the interaction between the two tasks. Different from the general BERT-based model, the method in this paper integrates the part-of-speech information into the output representation of BERT and obtains the semantic feature information of the input text through linguistic knowledge, which solves the problem of the BERT model finding it difficult to describe the linguistic knowledge contained in the text. At the same time, this paper learns the embeddings in the "aspect word, sentiment polarity, sentiment word" triplet through RGCN, which enriches the contextual relationship between the aspect word and the sentiment word in the text to better predict the aspect–sentiment polarity. The experimental results on three open data sets show that the proposed model can achieve the most advanced performance compared with previous models. Further analysis shows that the learning method that fuses semantic information and uses knowledge graphs has a better ABSA performance than the standard end-to-end multi-task learning method, which shows that the proposed method is credible and compelling.