Multi-Task Learning Model Based on BERT and Knowledge Graph for Aspect-Based Sentiment Analysis

He, Zhu; Wang, Honglei; Zhang, Xiaoping

doi:10.3390/electronics12030737

Open AccessArticle

Multi-Task Learning Model Based on BERT and Knowledge Graph for Aspect-Based Sentiment Analysis

by

Zhu He

^1,†

,

Honglei Wang

^1,2,*,† and

Xiaoping Zhang

³

¹

School of Electrical Engineering, Guizhou University, Guiyang 550025, China

²

Key Laboratory of “Internet +” Collaborative Intelligent Manufacturing in Guizhou Province, Guiyang 550025, China

³

Science and Technology Department of Guizhou Province, Yunyan District, Guiyang 550000, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2023, 12(3), 737; https://doi.org/10.3390/electronics12030737

Submission received: 29 December 2022 / Revised: 25 January 2023 / Accepted: 27 January 2023 / Published: 1 February 2023

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Aspect-based sentiment analysis (ABSA) aims to identify the sentiment of an aspect in a given sentence and thus can provide people with comprehensive information. However, many conventional methods need help to discover the linguistic knowledge implicit in sentences. Additionally, they are susceptible to unrelated words. To improve the performance of the model in the ABSA task, a multi-task sentiment analysis model based on Bidirectional Encoder Representation from Transformers (BERT) and a Knowledge Graph (SABKG) is proposed in this paper. Expressly, part-of-speech information is incorporated into the output representation of BERT, thereby obtaining textual semantic information through linguistic knowledge. It also enhances the textual representation to identify the aspect terms. Moreover, this paper constructs a knowledge graph of aspect and sentiment words. It uses a graph neural network to learn the embeddings in the triplet of “aspect word, sentiment polarity, sentiment word”. The constructed graph improves the contextual relationship between the text’s aspect and sentiment words. The experimental results on three open datasets show that the proposed model can achieve the most advanced performance compared with previous models.

Keywords:

aspect-based sentiment analysis; BERT encoder; knowledge graph; multi-task learning model

1. Introduction

Sentiment analysis refers to the study of people’s opinions, emotions, and attitudes towards services, events, products, topics, and their attributes. With the rapid development of social media in today’s world, sentiment analysis has also become a basic task in the field of natural language processing [1]. Sentiment analysis can be divided into three types: text-level sentiment classification, sentence-level sentiment classification, and aspect-based sentiment analysis (ABSA), according to the research granularity of the text. The ability to more accurately judge the emotional polarity of different emotional subjects in a sentence has become an important research direction in the field of sentiment analysis [2]. For example: “The waiters are very friendly, and the pasta is simply average”; there are two aspect words, “waiters” and “pasta”, in this comment, the emotional polarities are positive and neutral, respectively, and the corresponding opinion words are “friendly” and “simply average”. Aspect-level sentiment classification is very promising in various industries. For example, in the field of catering services, restaurants can quickly and deeply analyze users’ consumption habits and personal preferences by analyzing users’ product reviews [3] and it is quick for understanding the public’s attitude towards a policy or opinion on a hot topic [4].

In some early works, researchers used artificially designed feature sets to train Support Vector Machine (SVM) classifiers to solve aspect-level sentiment analysis tasks [5,6,7]. Later, neural network-based methods have been widely used in this task due to their flexible structures and automatic feature extraction capabilities. Most neural network models are based on long short term memory (LSTM) [8,9]. However, LSTM extracts features from left-to-right or right-to-left in the text, which is not a true bidirectional text feature extraction method. To focus on the parts of a particular aspect in a sentence, many attention-based models have emerged [10,11,12,13], and various BERT-based language model [14] have achieved a state-of-the-art performance in many NLP tasks. Masked language models can incorporate context, making BERT a pre-trained deep bidirectional transformer. Nevertheless, in most models, the aspect word and context word vectors produced by the intermediate hidden layers are represented as an explicit representation of the context [15] and are further used to compute the aspect score. However, it is difficult for BERT to describe the linguistic knowledge in the text, and it will lose much potential emotional information.

Based on these observations, this paper proposes a model sentiment analysis based on BERT and knowledge graph (SABKG). First, the model incorporates part-of-speech information into the output representation of BERT, obtains text semantic information through linguistic knowledge, enhances the text representation, and better identifies aspect objects through BERT. Then, this model constructs the triplet of “aspect word, sentiment polarity, sentiment word” and constructs the knowledge graph of the aspect word and sentiment word, where the aspect word and sentiment word are nodes, and the sentiment polarity is the edge. The embedding of nodes and edges in triples is learned using RGCN as an input to TransH. The framework proposed in this paper can enrich the contextual relationship between aspect words and sentiment words in the text, strengthen the representation of aspect words and sentiment words, and predict sentiment polarity.

This paper evaluates the proposed aspect-based sentiment analysis model SABKG on three public benchmark datasets. The experimental results show that the model performs better than other previous models. The main contributions of this paper are as follows:

Adds part-of-speech features to the input of BERT. Specifically, the part-of-speech feature obtained by w2v is added to the input of BERT to learn the part-of-speech of subject words and sentiment words. This improves the model’s ability to understand linguistic knowledge to a certain extent.
Uses a combination of word vectors, position vectors, and part-of-speech vectors to construct triples of “aspect words, sentiment polarity, sentiment words” through BERT. Enriches part-of-speech features for entity recognition and part-of-speech tagging to prepare for downstream learning.
Takes the output of the triplet of bert+crf as the input of RGCN. The embedding representation of the triplet of aspect word, sentiment word, and sentiment polarity is learned through the graph neural network. The three embeddings are input into TransH to strengthen the representation of different sentiment polarity words.

The rest of this article is organized as follows. Related methods for aspect-based sentiment analysis are reviewed in Section 2, and their advantages and disadvantages are discussed. The SABKG model structure is introduced in detail in Section 3. Section 4 verifies the model’s effectiveness experimentally and compares its performance with other methods. Section 5 summarizes the work of this paper.

2. Related Work

ABSA aims to identify the emotional polarity of an aspect of a given text. Most early works constructed feature sets manually labeling features, including aspect feature set, dictionary feature set, and parsing feature set, to train the SVM for sentiment analysis [5,6,7]. However, such methods make it too challenging to create feature sets due to the immense workload of manual labeling. With the development of the neural network, more and more neural network methods are being used to solve this task. The LSTM model can learn from long-term dependencies due to its strong ability to forget, memorize and update information. At the same time, it can also solve the problem of vanishing gradient [9,10,11,12,13,14,15,16]. ME Basiri et al. [17] proposed a bidirectional CNN-RNN model based on attention, which obtains context information from time through bidirectional GRU and LSTM layers. X Li et al. [18] used the two-layer LSTM model to perform the ABSA task and then spliced the last hidden state of the two LSTMs as the classification feature. Finally, the emotional features of the context were integrated into the aspect representation. However, the one-way feature extraction method of LSTM cannot truly obtain global parts.

Some researchers use external knowledge to enhance semantic representation to improve the model’s performance. Ma et al. [8] added the exterior features of SenticNet to the original structure of LSTM to strengthen the model’s ability for sentiment analysis. To enhance the sentiment polarity prediction ability of LSTM for input text, Teng et al. [19] performed manual polarity classification on sentiment words in each sentence. Tay et al. [20] utilized attention-gated units of learning their embeddings by combining sentiment lexicon and sentence information. A humanlike approach that takes sentiment grammar knowledge as prior knowledge was proposed for aspect-level sentiment classification [21]. Ghosal et al. [22] proposed a domain-adaptive model that leverages external standard sense-related information to improve sentiment analysis performance. Chen et al. [23] designed a sentiment analysis framework, KNEE, which combines sentiment classification and the recognition of aspect-sentiment pairs into one text classification task. Mireille Fares et al. introduced an unsupervised word-level sentiment analysis framework that computes and propagates sentiment scores in lexical sentiment maps [24]. Cambria et al. [25] built a new sentiment analysis knowledge base with integrated applications of symbolic artificial intelligence. A recent trend in graph embedding techniques is to consider embedding whole networks rather than individual nodes [26], which can also be used for sentiment analysis. Existing sentiment lexical methods have similar motivations, but struggle to describe aspect–sentiment relationships for sentiment recognition and classification.

With the continuous development of language models, the text pre-training model BERT proposed by Devlin et al. [27] performs well in text classification tasks. It uses a two-way transformer to build an overall network and learns the context information on the left and right sides. At the same time, in the pre-training process, text vector features can be learned from shallow to deep. Compared with RNN and LSTM, BERT can be performed concurrently and extract information at different levels, reflecting more comprehensive sentence semantics. Sun et al. [28] used aspects to construct interrogative sentences and combined them with the BERT model to propose models such as BERT-pair-QA-M. The experimental results show that the classification results of the short text ABSA data set are better than the classification results using the neural network algorithm. Gao et al. [29] proposed the TD-BERT model based on BERT and further improved the classification performance by extracting the features of specific positions of the BERT coding layer for aspect-level sentiment classification, indicating that BERT has a superior performance in feature extraction. However, previous research based on BERT did not solve the problem that BERT finds it challenging to understand a large amount of linguistic knowledge in the text, but simply extracts text features, which will lose much potential emotional information.

In this paper, we combined BERT and RGCN to enable our model to exploit the syntactic structure information of sentences. Due to the complexity of the association between text contents, we constructed the “aspect word, sentiment polarity, sentiment word” triplet and used the RGCN to learn the embeddings in the triplet and build the knowledge graph of the aspect word and sentiment word. The context relationship between subject words and emotion words was further extracted to improve the accuracy of emotion polarity prediction.

3. Proposed Method

This paper adopted a two-stage model design, and the overall design of the model is shown in Figure 1. The model in the first stage extracts the target subject words and sentiment words that may describe the target subject words in all reviews through the BERT and CRF modules, as well as the aspect term polarity of the target words. First, the processed text is encoded by the BERT encoder to obtain the corresponding word vector

h_{i}

, and the part of speech of the word is input into w2v to obtain the embedded

w_{i}

of the part of speech. Their mean value is taken as the input of CRF. The second stage is to construct the “aspect word, sentiment polarity, sentiment word” triplet and use RGCN to construct the embedding of aspect, sentiment polarity, and sentiment word. Among them, rel_i represents heterogeneous graphs of different triples and is used as the input of RGCN. ReUL is an activation function. Finally, TransH is used to process the output of RGCN, so as to improve the accuracy of predicting the emotional polarity corresponding to the aspect words.

3.1. BERT-CRF Aspect and Sentiment Word Recognition

The first layer of the model is to use the pre-trained BERT language model to initialize the word vector of the input text information as sequence

X = {x_{1}, x_{2}, \dots, x_{n}}

, which can effectively extract text by using the relationship between words. The second layer of the model is to use the average value of the word vector and part-of-speech vector generated by BERT as the input of the CRF, use the Language Technology Platform (LTP) tool to segment the input text, and then use the BIO template to mark the word segmentation result to indicate the phrase boundary. If a word is the starting word of a sentence, it is marked with a “B” label; if it is the next word of the sentence, it is marked as “I”; if it is not a phrase word, it is marked as “O”. Similarly, after obtaining the word segmentation label, this paper used word embedding technology to map the word segmentation label to the vector space, so that CRF can be used to segment and label sequence data and predict the corresponding state sequence according to the input sequence while considering the current state features of the input and transfer features for each label category. CRF is mainly based on the prediction output sequence of the BERT model to find the sequence that optimizes the objective function.

3.1.1. Entity Recognition

For a user comment text, it may contain both the user’s positive comments and the user’s negative comments. For example, “Waiters are very friendly and the paste is simply average” mentions two comment subjects: “waiters” and “pasta”. Their sentiment polarities are “Positive” and “Negative”, respectively. At the same time, we can know that the main words representing their emotions are “friendly” and “average”. Therefore, we need to first extract the aspect words and sentiment words that represent the reasons for sentiment polarity in the text.

First, use the BIO template to label the text sequence. The sequence labeling can construct a label for each word in this article: B-begin, representing the beginning of the entity I-inside, representing the middle or the end of the entity O-outside, representing not belonging to the entity. For example, “John Smith lives in New York” could be labeled “B-name, I-name, O, O, B-name, I-name”. Among them, B-name and I-name are the subjects; that is, John Smith and New York are the subjects.

The word vector

X = {x_{1}, x_{2}, \dots, x_{n}}

of the processed text is recorded as the sequence as the input of BERT, and the part-of-speech sequence

W = {w_{1}, w_{2}, \dots, w_{n}}

of the words in the text is encoded by word2vec through Section 3.1.2. Vector h is the word vector of the sentence processed by BERT. The new vector is used as the input of the CRF module after fusing the vectors w and h and taking the mean.

K = a v g (h (h_{1}, h_{2}, \dots, h_{i}) + w (w_{1}, w_{2}, \dots, w_{i})) .

(1)

The hidden context vector

K = (k_{1}, k_{2}, \dots, k_{i})

can be directly used as a feature so that each output

y_{i}

of the label sequence

Y = (y_{1}, y_{2}, \dots, y_{i})

makes an independent labeling decision. Combined with the state transition matrix in CRF, an optimal global sequence is obtained according to the adjacent labels, and the score of the label sequence can be expressed as:

s (K, y) = \sum_{i = 1} (Z_{y_{i}, y_{i + 1}} + P_{i + 1, y_{i + 1}}) .

(2)

Among them, Z is the transition matrix;

Z_{y_{i}, y_{i + 1}}

is the score of the label transfer from

y_{i}

to

y_{i + 1}

;

P_{i + 1, y_{i + 1}}

is the score of the i+1th word in the input sequence corresponding to the label

y_{i + 1}

. The probability of the label sequence y is caluclated, which can be expressed as:

p (y ∣ K) = \frac{exp (score (K, y))}{\sum_{\tilde{y} \in Y_{K}} exp (score (K, \tilde{y}))},

(3)

where

Y_{K}

is the set of all possible label sequences and the label of the final output sequence is the label set with the highest probability.

3.1.2. Sentiment Word Recognition

To identify the sentiment words that describe the target aspect, the Natural Language Toolkit (NLTK) is used to identify the part of speech of the text. Considering that the words describing the target aspect are generally adjectives, adverbs, interjections, and other types, this paper extracted adjectives, adjective comparatives, adjective superlatives, etc., from the text. Let Y_{sent_set} = {adj, adv, adj_com, adj_sup, adv_com, adv_sup} be an ordered set of sentiment word types. Through the target subject identified in Section 3.1.1, the aspect word i in the short sentence with a radius r = eight is scanned from the center to both sides. If the first word

y_{sent} \in Y_{sent_set}

is scanned, then

Y_{sent}

is the emotion corresponding to the aspect word i. Multiple subject words can be queried according to the same rules. See Algorithm 1 for details of the algorithm.

Algorithm 1. Recognition of aspect words

Input: Aspect word i
Output: Sentiment word y
1: for elem in S do
2: %S is sentence set, elem is sentence
3: asp_set = get_asp(elem)
4: for i in asp_set do
5: for k in range(1, r + 1) do
6: if y = get_pob(i + k) and in_Y_sent == Ture then
7: %get_pob(i + k) means to obtain the part of speech of the kth word from the right side of the aspect word i
8: y is the sentiment word corresponding to i
9: else if y = get_pob(i-k) and in_Y_sent == Ture then
10: y is the sentiment word corresponding to i
11: else
12: no sentiment words
13: end if
14: end for
15: end for
16: end for

3.2. Building a Knowledge Graph with RGCN

3.2.1. Construct a Heterogeneous Graph of “Aspect Word, Sentiment Polarity, Sentiment Word”

Given a sentence

X = \{x_{0}, \dots, x_{i}, \dots, x_{n - 1}\}

, the aspect item extraction model first extracts a set of aspects

A = \{a_{0}, \dots, a_{j}, \dots, a_{m - 1}\}

. For each extracted aspect

a_{j}

, the aspect-oriented sentiment word extraction model extracts its sentiment word

S = \{S_{j}^{0}, \dots, S_{j}^{k}, \dots, S_{j}^{l_{j} - 1}\}

, where

l_{j}

is the number of sentiments about the jth aspect and

l_{j} \geq 0

. Finally, for each extracted aspect–opinion pair

(a_{j}, s_{j}^{k})

, its sentiment

R_{j}^{k} \in P = {

positive, neutral, negative } is predicted by the aspect–opinion pair sentiment classification model. Triples are obtained by combining the results of the three models:

T = \{(a_{0}, r_{0}^{0}, s_{0}^{0}), \dots, (a_{m - 1}, r_{m - 1}^{l_{m - 1}}, s_{m - 1}^{l_{m - 1}})\}

. The prediction module is integrated into the whole model as a sub-task.

3.2.2. Construction of Knowledge Graph

According to RGCN, a heterogeneous graph of “aspect word, sentiment polarity, sentiment word” triples is constructed, as shown in Figure 2, taking “Waiters are very friendly, and the pasta is simply average, it tastes is just right” as an example, where nodes represent aspect and sentiment words, and edges and relationships represent sentiment polarity (positive, negative, neutral). The heterogeneous graph is used to identify the emotional relationship in the sentence and as the input of RGCN.

The triplet obtained after the previous step is input into the RGCN model, where the aspect word and the sentiment word are nodes, and the sentiment polarity is the edge. The propagation model is as follows:

h_{i}^{l + 1} = σ (\sum_{r \in R} \sum_{j \in N_{i}^{T}} \frac{1}{c_{i, r}} W_{r}^{l} h_{j}^{l} + W_{0}^{l} h_{i}^{l}),

(4)

where

h_{i}^{l}

represents node

v_{i}

of hidden layer I; W represents the set of neighbor nodes of node i under relation r;

c_{i, r}

is a normalized constant.

In order to prevent the model overfitting, this paper uses the following methods to decompose the training matrix to achieve parameter sharing between different relation weights:

W_{r}^{l} = \sum_{b = 1}^{B} a_{r b}^{l} V_{b}^{l} .

(5)

Since what is obtained is not a complete set of edges E, but an incomplete subset

\bar{E}

, in order to assign a score f(a, r, s) to the possible edges

(a, r, s)

, to determine how likely it is that these edges belong to E. This paper uses the DistMult factorization as the scoring function. In DistMult, each relation is associated with a diagonal matrix

R_{r} \in R^{d \times d}

, and a triple

(a, r, s)

is scored as:

f (a, r, s) = e_{a}^{T} R_{r} e_{s},

(6)

where

e_{i} \in R^{d}

is the real-valued vector to which the encoder maps each aspect entity

v_{i} \in V

.

In order to balance positive and negative samples, we need to construct ω negative samples for each positive sample (existing edge). The construction of this negative sample can be connected between completely unrelated nodes, or it can be used to replace an entity a or s in a pair of relationships

(a, r, s)

as another entity object with high frequency. This paper uses the cross-entropy loss function for optimization:

L = - \frac{1}{(1 + ω) | E |} \sum_{(a, r, s, y) \in T} y log l (f (a, r, s)) + (1 - y) log (1 - l (f (a, r, s))),

(7)

where T is the total set of true and corrupt triples, l is the logical sigmoid function, and y is an indicator, y = 1 for positive triples and y! = 1 for negative triples.

3.2.3. TransH Output Layer

The embedding of nodes and edges obtained by RGCN in Section 3.2.2 is used as the input to TransH. Considering that the TransH model can model one-to-many, many-to-one, and many-to-many relationship modes in the graph, this paper used the TransH model to model the graph. As the semantic information of triples and for triples is

(a, r, s)

, this paper used the vector representation output of the graph convolutional network as the input of TransH, where the distance function of TransH is defined as:

d (a, r, s) = {∥h_{a}^{'} + h_{r} - h_{s}^{'}∥}_{2}^{2},

(8)

where

h_{r}

represents the translation vector of emotion polarity r on the relationship plane,

h_{a}^{'}

and

h_{r}^{'}

are the projection vectors of aspect word vector

h_{a}

, and sentiment word vector

h_{r}

is on the relationship plane:

h_{a}^{'} = h_{a} - w_{v}^{T} h_{a} w_{v}

(9)

h_{s}^{'} = h_{s} - w_{v}^{T} h_{s} w_{v},

(10)

where

w_{v}

is the normal vector on the relation plane.

3.3. Training Target

The node representation of aspect words and sentiment words after TransH encoding can be represented as

h_{a s p}

and

h_{s e n t}

and, finally, the inner product of the target aspect word and the target sentiment is calculated to predict the matching score of aspect and sentiment:

y (a s p, s e n t) = {h_{a s p}}^{T} h_{s e n t} .

(11)

This paper chose the BPR loss function as the training target for emotion. Specifically, the BPR loss function believes that the observed interaction should achieve higher scores than the unobserved interaction; that is, the positive aspect word–sentiment word sample is expected to achieve a higher score than the negative aspect word–sentiment word sample. Therefore, during training, the optimized objective function is:

L_{1} = \sum_{(asp, sent, sent') \in O} - ln σ (y (asp, sent) - y (asp, sent')),

(12)

where O represents the training set, and (asp, sent) represents the positive emotion of the aspect words and sentiment words. (asp, sent′) indicates that the aspect words and the sentiment words are not positive emotions, that is, there is no correlation between the aspect word asp and the sentiment word sent. y(asp, sent) and y(asp, sent′) represent the scores of positive and negative cases, respectively. When there is a neutral or negative relationship between the subject word and the emotion word, this formula can also be used to express it.

For the TransH model, this paper set the loss function as:

L_{2} = \sum_{(e_{1}, r, e_{2}) \in T} max (0, d (e_{1}, r, e_{2}) - d (e_{1}^{'}, r, e_{2}^{'}) + m),

(13)

where

L_{2}

is the loss function based on margin,

d (e_{1}, r, e_{2})

is the distance function of positive triples,

d (e_{1}^{'}, r, e_{2}^{'})

is the distance function of negative triples, and T is all set of triples.

The final loss function L is the set of

L_{1}

and

L_{2}

:

L = λ_{1} L_{1} + λ_{2} L_{2},

(14)

where

λ_{1}

and

λ_{2}

are the weights of loss functions

L_{1}

and

L_{2}

, respectively, and

λ_{1}

+

λ_{2}

= 1.

4. Experiments and Results Discussion

4.1. Dataset and Experiment Setup

To verify the model’s effectiveness, this paper selected four widely used public datasets to evaluate the model proposed in this study; they are Restaurant14 and Laptop14 in SemEval-2014 Task4 and Restaurant15 in SemEval-2015. They were labeled with three sentiment labels: positive, neutral, and negative, and the specific statistics are shown in Table 1. Among them, long text (sentence length greater than 40) in data set LAP14 is the largest, accounting for 19.48%. This means that the performance of the model on LAP14 will decrease compared with Res14 and Res15. However, it can better reflect the performance of the model in dealing with long and difficult sentences. At the same time, the data of short text (sentence length of less than 20) in Res14 and Res15 are the largest, accounting for 57.69% and 68.96% of the total, respectively.

In the experiments of this paper, the number of transformer layers L = 12, and the hidden size

d i m_{h}

was 768. The learning rate was 2 × 10⁻⁵. The batch size was set as 25 for Lap14 and 16 for Rest 14 and Rest 15. We trained the model up to 1500 steps. After training 1000 steps, we conducted model selection on the development set for every 100 steps according to the micro-averaged F1 score. The experimental platform of this paper was pycharm, and the python version was python 3.6.

4.2. Model Comparisons

This article compares the following methods:

CMLA [30]-ALSTM [31]: CMLA is good at aspect extraction tasks. ALSTM performs well in aspect feature classification tasks. Combining CMLA-ALSTM can be used for ABSA tasks.
AHGCN [32]: It adds heterogeneous diagrams of multiple relationships to GCN and uses them for ABSA tasks.
MNN [33]: It designs a marking scheme to integrate ABSA tasks.
INABSA [18]: It utilizes a unified labeling scheme to integrate the two subtasks of ABSA.
DREGCN [34]: It adds a messaging mechanism to the GCN to enhance the relationship representation.
BERT + GRU [35]: It combines BERT and GRU for end-to-end ABSA tasks.
DOER [36]: The model uses a double-cross shared RNN unit for feature learning.
R-GAT [37]: This method encodes the tree structure based on aspects for sentiment prediction based on a graph attention network.
MTMVN [38]: It is a multi-view learning model that provides new ideas for ABSA tasks.
IACapsNet [39]: This model uses an EM routing algorithm to cluster sentiment features and uses a capsule network to model features.
RACL [40]: It proposes a relation-aware collaborative learning network that combines aspect–sentiment term extraction and aspect–sentiment classification tasks.

Table 2 and Figure 3 show the comparison results between our method and the above method. First, our model outperforms the two pipeline methods. Compared with DECNN-dTrans, it achieves absolute gains of 18.82%, 13.95%, and 15.53% in the F1-score on Notebook, Rset14, and Rest15 data sets. At the same time, it can be seen from the results of comparative experiments that most end-to-end framework models outperform traditional pipeline methods, which broadens a new track for the future development of ABSA tasks. At the same time, compared with R-GAT, the accuracy of the method in this paper on the Rest14 dataset is not improved. This is because the method in this paper integrates the grammatical information of sentences, and there is a large number of ungrammatical sentences in the Rest14 dataset, which affects the performance of the model.

Our method outperforms the DOER model using the sentiment vocabulary on both evaluation metrics. DOER combines aspect sentiment classification and aspect term extraction to obtain the final end-to-end labels, but it cannot flexibly handle various cases of labeling results. The latter three models use the sentiment word tagging method to perform the aspect word extraction, sentiment word extraction, and aspect sentiment classification tasks. In the absence of comment word annotation, our method achieves better results.

On three benchmark datasets, the proposed SABKG in this paper also achieves 20.58%, 13.95%, and 13.75% improvement over the F1 scores of MNNs on the three benchmark datasets compared to other end-to-end methods. MNN is an end-to-end most typical multi-task ABSA algorithm. However, since it performs end-to-end ABSA under a unified labeling scheme, the interaction between the two tasks of aspect extraction (AE) and aspect–sentiment classification (ASC) is not considered. Meanwhile, although INABSA adds the aspect word extraction task for unified labeling, it ignores vital sentiment information. Therefore, the method in this paper outperforms the MNN model and the INABSA model, which shows that adding RGCN’s ability to encode unstructured data into the model can effectively extract the linguistic knowledge hidden in the text. Therefore, the ability to combine the two sub-tasks of AE and ASC is further improved.

4.3. Comparison with BERT-BASED Model

This paper also compared the model based on the BERT encoder to verify the effectiveness and universality of the model. The experimental results are shown in Table 3 and Figure 4.

BERT-GRU, GBM-BERT, R-GAT-BERT and RACL-BERT are all based on BERT encoder models. BERT uses a pre-trained model to perform ABSA tasks, and its unique feature extraction layer makes it better than non-BERT models in terms of performance. However, their average performance on the two data sets Lap14 and Res14 is still lower than that of the SABKG model proposed in this paper. In the SABKG model, we added the part-of-speech vector to the model to extract the hidden information in the text, thus avoiding the interference of other words. This shows that our strategy of using a knowledge map to extract aspect word embedding information is effective. GBM-BERT combines GBM and BERT and uses a novel gating mechanism to perform ABSA tasks. This unique gating mechanism can filter out useless information and improve model performance. Its performance is better than that of BERT-GRU. However, because it cannot recognize the semantic information of text, the performance of the BERT encoder is still not improved. RACL-BERT combines the idea of multi-task learning, which significantly improves the performance of the traditional baseline model on two data sets. R-GAT-BERT uses the graph attention network to perform the ABSA task, and its performance has been significantly improved after combining with the BERT encoder. On data set Rest14, R-GAT-BERT is 3.53% higher than SABGK in F1 value. This is because it will rely on the tree structure to focus on the aspects of the target. However, because there are many long and difficult sentences in the Lap14 dataset, R-GAT-BERT cannot focus on the correct aspects when processing these sentences. SABKG can capture local and global context information, and has a strong performance. In the last layer of the BERT encoder, the context implicit vector and aspect implicit vector are simultaneously extracted, and the part-of-speech vector is added between the BERT and CRF inputs, which enhances the linguistic knowledge representation. The interactive features of context and aspect words were further extracted using the RGCN network. Using the Laptop 14 and Restaurant 14 datasets, the domain adaptation of the BERT pre-training and fine-tuning stages is improved to achieve the best results.

4.4. Ablation Experiment

In this section, a series of variants was designed to verify the effectiveness of the proposed components. The experimental results are shown in Table 4 and Figure 5; one module or policy was deleted at a time.

First, this paper removed the TransH module from the model. The F1 scores drop significantly on all three data sets, suggesting that Trans H provides useful aspect information for detecting aspect boundaries, enhancing the edge learning of the sentiment polarity of aspect words. Likewise, the Acc score drops when CRF is removed. The CRF module acquires basic emotional knowledge by detecting sentiment words. TransH and CRF are distinct but essential, setting the stage for subsequent work.

After removing the RGCN module, the F1 score drops on all datasets. If the RGCN module is removed, the relationship between aspects, sentiment words, and sentiment polarity triples extracted from the BERT module cannot be parsed. When the sentiment words are far from the aspect words, the opinion features of the sentiment words are lost. The RGCN module can better capture the connection between aspect and sentiment words and learn the inner relationship between them.

To further verify the role of RGCN in ABSA, this paper first removed the node from the knowledge graph (the edges connected to it also disappear) to observe the effect of the model. Then, this paper conducted the same experiments on a knowledge graph with deleted nodes. As seen from Figure 4, removing the nodes does not make the model work well, and without the help of the nodes, the experimental results are not good. On the two data sets, the Acc of the model decreased by 3.95% and 5.47%, and the F1 decreased by 14.71% and 8.33%, respectively, indicating that the contribution of the RGCN module to the model in this paper is very significant. As seen from the table, the Acc and F1 metrics are decreasing on all three datasets, which indicates that removing nodes has a more significant impact on the model than removing CRF and TransH. One possible reason is that nodes learn the polar representations of aspect and sentiment words and then propagate the representation information to nodes and integrate it into the sub-predicted module. Therefore, the knowledge graph constructed by RGCN can improve the emotional polarity prediction task of aspect words and emotion words. This reflects the necessity of the graph neural network learning of the nodes and edges of the aspect word–sentiment word.

5. Conclusions

This paper proposes an SABKG model for improving the ABSA task, and two essential conclusions can be drawn from the analysis. First, this paper combines the aspect extraction task and the aspect sentiment classification task to complete the ABSA task through the interaction between the two tasks. Different from the general BERT-based model, the method in this paper integrates the part-of-speech information into the output representation of BERT and obtains the semantic feature information of the input text through linguistic knowledge, which solves the problem of the BERT model finding it difficult to describe the linguistic knowledge contained in the text. At the same time, this paper learns the embeddings in the "aspect word, sentiment polarity, sentiment word" triplet through RGCN, which enriches the contextual relationship between the aspect word and the sentiment word in the text to better predict the aspect–sentiment polarity. The experimental results on three open data sets show that the proposed model can achieve the most advanced performance compared with previous models. Further analysis shows that the learning method that fuses semantic information and uses knowledge graphs has a better ABSA performance than the standard end-to-end multi-task learning method, which shows that the proposed method is credible and compelling.

Author Contributions

Methodlogy, Z.H. and H.W.; software, Z.H.; validation, Z.H. and H.W.; writing-original draft preparation, Z.H.; writing-review and editing, Z.H. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data included in this study are available upon request by contacting the first author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, J.; Zhang, A.; Liu, D.; Bian, Y. Customer preferences extraction for air purifiers based on fine-grained sentiment analysis of online reviews. Knowl.-Based Syst. 2021, 228, 107259. [Google Scholar] [CrossRef]
Yousefinaghani, S.; Dara, R.; Mubareka, S.; Papadopoulos, A.; Sharif, S. An analysis of COVID-19 vaccine sentiments and opinions on Twitter. Int. J. Infect. Dis. 2021, 108, 256–262. [Google Scholar] [CrossRef] [PubMed]
Li, H.; Bruce, X.; Li, G.; Gao, H. Restaurant survival prediction using customer-generated content: An aspect-based sentiment analysis of online reviews. Tour. Manag. 2023, 96, 104707. [Google Scholar] [CrossRef]
Yan, C.; Liu, J.; Liu, W.; Liu, X. Research on public opinion sentiment classification based on attention parallel dual-channel deep learning hybrid model. Eng. Appl. Artif. Intell. 2022, 116, 105448. [Google Scholar] [CrossRef]
Kiritchenko, S.; Zhu, X.; Cherry, C.; Mohammad, S. Nrc-canada-2014: Detecting aspects and sentiment in customer reviews. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, 23–24 August 2014; pp. 437–442. [Google Scholar]
Manandhar, S. Semeval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, 23–24 August 2014. [Google Scholar]
Zhang, Z.; Lan, M. ECNU: Extracting Effective Features from Multiple Sequential Sentences for Target-dependent Sentiment Analysis in Reviews. In Proceedings of the SemEval@ NAACL-HLT, Denver, CO, USA, 4–5 June 2015; pp. 736–741. [Google Scholar]
Ma, Y.; Peng, H.; Cambria, E. Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Bao, L.; Lambert, P.; Badia, T. Attention and lexicon regularized LSTM for aspect-based sentiment analysis. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, Florence, Italy, 28 July–2 August 2019; pp. 253–259. [Google Scholar]
Tang, H.; Ji, D.; Li, C.; Zhou, Q. Dependency graph enhanced dual-transformer structure for aspect-based sentiment classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 6578–6588. [Google Scholar]
Lv, Y.; Wei, F.; Cao, L.; Peng, S.; Niu, J.; Yu, S.; Wang, C. Aspect-level sentiment analysis using context and aspect memory network. Neurocomputing 2021, 428, 195–205. [Google Scholar] [CrossRef]
Wu, C.; Xiong, Q.; Yang, Z.; Gao, M.; Li, Q.; Yu, Y.; Wang, K.; Zhu, Q. Residual attention and other aspects module for aspect-based sentiment analysis. Neurocomputing 2021, 435, 42–52. [Google Scholar] [CrossRef]
Liu, M.; Zhou, F.; Chen, K.; Zhao, Y. Co-attention networks based on aspect and context for aspect-level sentiment analysis. Knowl.-Based Syst. 2021, 217, 106810. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Zhang, H.; Pan, F.; Dong, J.; Zhou, Y. BERT-IAN Model for Aspect-based Sentiment Analysis. In Proceedings of the 2020 International Conference on Communications, Information System and Computer Engineering (CISCE), Kuala Lumpur, Malaysia, 3–5 July 2020; pp. 250–254. [Google Scholar]
Tang, D.; Qin, B.; Feng, X.; Liu, T. Effective LSTMs for target-dependent sentiment classification. arXiv 2015, arXiv:1512.01100. [Google Scholar]
Basiri, M.E.; Nemati, S.; Abdar, M.; Cambria, E.; Acharya, U.R. ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis. Future Gener. Comput. Syst. 2021, 115, 279–294. [Google Scholar] [CrossRef]
Li, X.; Bing, L.; Li, P.; Lam, W. A unified model for opinion target extraction and target sentiment prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 6714–6721. [Google Scholar]
Teng, Z.; Vo, D.T.; Zhang, Y. Context-sensitive lexicon features for neural sentiment analysis. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 1629–1638. [Google Scholar]
Tay, Y.; Luu, A.T.; Hui, S.C.; Su, J. Attentive gated lexicon reader with contrastive contextual co-attention for sentiment classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 3443–3453. [Google Scholar]
Yang, M.; Jiang, Q.; Shen, Y.; Wu, Q.; Zhao, Z.; Zhou, W. Hierarchical human-like strategy for aspect-level sentiment classification with sentiment linguistic knowledge and reinforcement learning. Neural Netw. 2019, 117, 240–248. [Google Scholar] [CrossRef] [PubMed]
Ghosal, D.; Hazarika, D.; Roy, A.; Majumder, N.; Mihalcea, R.; Poria, S. Kingdom: Knowledge-guided domain adaptation for sentiment analysis. arXiv 2020, arXiv:2005.00791. [Google Scholar]
Chen, F.; Huang, Y. Knowledge-enhanced neural networks for sentiment analysis of Chinese reviews. Neurocomputing 2019, 368, 51–58. [Google Scholar] [CrossRef]
Fares, M.; Moufarrej, A.; Jreij, E.; Tekli, J.; Grosky, W. Unsupervised word-level affect analysis and propagation in a lexical knowledge graph. Knowl.-Based Syst. 2019, 165, 432–459. [Google Scholar] [CrossRef]
Cambria, E.; Li, Y.; Xing, F.Z.; Poria, S.; Kwok, K. SenticNet 6: Ensemble application of symbolic and subsymbolic AI for sentiment analysis. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Gold Coast, QLD, Australia, 19–23 October 2020; pp. 105–114. [Google Scholar]
Cavallari, S.; Cambria, E.; Cai, H.; Chang, K.C.C.; Zheng, V.W. Embedding both finite and infinite communities on graphs [application notes]. IEEE Comput. Intell. Mag. 2019, 14, 39–50. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Sun, C.; Huang, L.; Qiu, X. Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. arXiv 2019, arXiv:1903.09588. [Google Scholar]
Xu, H.; Liu, B.; Shu, L.; Yu, P.S. BERT post-training for review reading comprehension and aspect-based sentiment analysis. arXiv 2019, arXiv:1904.02232. [Google Scholar]
Wang, W.; Pan, S.J.; Dahlmeier, D.; Xiao, X. Coupled multi-layer attentions for co-extraction of aspect and opinion terms. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Wang, Y.; Huang, M.; Zhu, X.; Zhao, L. Attention-based LSTM for aspect-level sentiment classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 606–615. [Google Scholar]
Xu, K.; Zhao, H.; Liu, T. Aspect-specific heterogeneous graph convolutional network for aspect-based sentiment classification. IEEE Access 2020, 8, 139346–139355. [Google Scholar] [CrossRef]
Wang, F.; Lan, M.; Wang, W. Towards a one-stop solution to both aspect extraction and sentiment analysis tasks with neural multi-task learning. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
Liang, Y.; Meng, F.; Zhang, J.; Chen, Y.; Xu, J.; Zhou, J. A dependency syntactic knowledge augmented interactive architecture for end-to-end aspect-based sentiment analysis. Neurocomputing 2021, 454, 291–302. [Google Scholar] [CrossRef]
Li, X.; Bing, L.; Zhang, W.; Lam, W. Exploiting BERT for end-to-end aspect-based sentiment analysis. arXiv 2019, arXiv:1910.00883. [Google Scholar]
Luo, H.; Li, T.; Liu, B.; Zhang, J. DOER: Dual cross-shared RNN for aspect term-polarity co-extraction. arXiv 2019, arXiv:1906.01794. [Google Scholar]
Du, C.; Sun, H.; Wang, J.; Qi, Q.; Liao, J.; Xu, T.; Liu, M. Capsule network with interactive attention for aspect-level sentiment classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5489–5498. [Google Scholar]
Bie, Y.; Yang, Y. A multitask multiview neural network for end-to-end aspect-based sentiment analysis. Big Data Min. Anal. 2021, 4, 195–207. [Google Scholar] [CrossRef]
Wang, K.; Shen, W.; Yang, Y.; Quan, X.; Wang, R. Relational graph attention network for aspect-based sentiment analysis. arXiv 2020, arXiv:2004.12362. [Google Scholar]
Chen, Z.; Qian, T. Relation-aware collaborative learning for unified aspect-based sentiment analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 3685–3694. [Google Scholar]
Mao, R.; Li, X. Bridging towers of multi-task learning with a gating mechanism for aspect-based sentiment analysis and sequential metaphor identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 13534–13542. [Google Scholar]

Figure 1. Overall structure of SABKG.

Figure 2. Heterogeneous graph of “aspect word, sentiment polarity, sentiment word”.

Figure 3. Comparison with Baseline model.

Figure 4. Comparison with BERT-BASED model.

Figure 5. The effect of missing different modules on the three datasets.

Table 1. Data sets.

Dataset	Train			Test
Dataset	Pos	Neu	Neg	Pos	Neu	Neg
Rest 14	2164	637	807	728	196	196
Lap 14	994	464	870	341	169	128
Rest 15	1178	50	382	439	35	328

Table 2. Comparison of different models’ performance.

Model		Laptop14		Rest14		Rest15
Model		Acc	F1	Acc	F1	Acc	F1
	CMLA-ALSTM [30,31]	70.25	53.68	77.46	63.87	81.03	54.79
	AHGCN [32]	76.80	73.00	82.02	72.67	79.94	62.79
	MNN [33]	70.40	53.80	77.17	63.87	80.79	56.57
	INABSA [18]	72.30	55.88	79.68	66.60	82.56	57.38
	DREGCN [34]	77.86	61.60	81.88	70.21	86.16	73.35
	MTMVN [38]	-	55.08	-	65.20	-	-
	DORE [36]	-	56.71	-	68.55	-	50.31
	R-GAT [37]	77.42	73.76	83.30	76.08	80.83	64.17
	IACapsNet [39]	76.80	73.29	81.79	73.40	-	-
	RACL [40]	73.53	58.28	81.42	69.59	83.26	59.85
Proposed model	SABKG	78.18	74.38	81.22	77.82	85.00	70.32

Table 3. Comparison with BERT-BASED Model.

Model		Laptop14		Rest14
Model		Acc	F1	Acc	F1
BERT Models	BERT + GRU [35]	61.88	61.12	70.61	73.24
	GBM-BERT [41]	-	65.61	-	75.73
	R-GAT-BERT [39]	78.21	74.07	86.60	81.35
	RACL-BERT [40]	-	63.40	-	75.42
Proposed model	SABKG	78.18	74.38	81.22	77.82

Table 4. The effect of different modules on the model.

Model	Laptop14		Rest14		Rest15
Model	Acc	F1	Acc	F1	Acc	F1
FULL	78.18	74.38	81.22	77.82	85.00	70.32
-TransH	76.23	60.55	75.63	71.86	84.68	66.31
-RGCN	74.23	59.67	72.35	69.49	81.16	63.35
-CRF	75.14	63.11	74.86	70.88	83.12	65.54

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, Z.; Wang, H.; Zhang, X. Multi-Task Learning Model Based on BERT and Knowledge Graph for Aspect-Based Sentiment Analysis. Electronics 2023, 12, 737. https://doi.org/10.3390/electronics12030737

AMA Style

He Z, Wang H, Zhang X. Multi-Task Learning Model Based on BERT and Knowledge Graph for Aspect-Based Sentiment Analysis. Electronics. 2023; 12(3):737. https://doi.org/10.3390/electronics12030737

Chicago/Turabian Style

He, Zhu, Honglei Wang, and Xiaoping Zhang. 2023. "Multi-Task Learning Model Based on BERT and Knowledge Graph for Aspect-Based Sentiment Analysis" Electronics 12, no. 3: 737. https://doi.org/10.3390/electronics12030737

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Task Learning Model Based on BERT and Knowledge Graph for Aspect-Based Sentiment Analysis

Abstract

1. Introduction

2. Related Work

3. Proposed Method

3.1. BERT-CRF Aspect and Sentiment Word Recognition

3.1.1. Entity Recognition

3.1.2. Sentiment Word Recognition

3.2. Building a Knowledge Graph with RGCN

3.2.1. Construct a Heterogeneous Graph of “Aspect Word, Sentiment Polarity, Sentiment Word”

3.2.2. Construction of Knowledge Graph

3.2.3. TransH Output Layer

3.3. Training Target

4. Experiments and Results Discussion

4.1. Dataset and Experiment Setup

4.2. Model Comparisons

4.3. Comparison with BERT-BASED Model

4.4. Ablation Experiment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI