Next Article in Journal
An Adaptive Hybrid Automatic Repeat Request (A-HARQ) Scheme Based on Reinforcement Learning
Previous Article in Journal
Modified Aquila Optimizer with Stacked Deep Learning-Based Sentiment Analysis of COVID-19 Tweets
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

PHNN: A Prompt and Hybrid Neural Network-Based Model for Aspect-Based Sentiment Classification

1
College of Computer and Control Engineering, Qiqihar University, Qiqihar 161006, China
2
Heilongjiang Key Laboratory of Big Data Network Security Detection and Analysis, Qiqihar University, Qiqihar 161006, China
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(19), 4126; https://doi.org/10.3390/electronics12194126
Submission received: 21 August 2023 / Revised: 8 September 2023 / Accepted: 14 September 2023 / Published: 3 October 2023
(This article belongs to the Section Artificial Intelligence)

Abstract

:
Aspect-based sentiment classification (ABSC) is an important task in natural language processing (NLP) that aims to predict the sentiment polarity of different aspects in a sentence. The attention mechanism and pre-trained models are commonly used in ABSC tasks. However, a single pre-trained model typically does not perceive downstream tasks very well, and the attention mechanism usually neglects the syntactic information of sentences. In this paper, we propose a prompt and hybrid neural network (PHNN) model, which utilizes the prompt and a hybrid neural network structure to solve the ABSC task. More precisely, it first uses the prompt to convert an input sentence into cloze-type text and utilizes RoBERTa to deal with the input. Then, it applies the graph convolutional neural network (GCN) combined with the convolutional neural network (CNN) to extract the syntactic features of the sentence while using bi-directional long short-term memory (BiLSTM) to obtain the semantic features of the sentence. Further, it utilizes the multi-head attention (MHA) mechanism to learn attention in the sentence and aspect words. Finally, the sentiment polarity of the aspect words is obtained by using the softmax function. Experiments on three benchmark datasets show that PHNN has the best performance compared with other baselines, validating the efficiency of our model.

1. Introduction

Sentiment analysis (SA) is an important research aspect of NLP, which studies emotions and attitudes about an entity in natural language texts. ABSC is an entity-level, fine-grained SA task that aims to determine the sentiment polarity (e.g., negative, neutral, or positive) of an entity in a sentence. E.g., in a comment sentence about a restaurant such as “poor restaurant but good food”, this sentence contains two aspects of sentiment polarity: the aspect word “food” shows a positive sentiment, and the aspect word “environment” indicates a negative sentiment. ABSC can accurately identify the attitude towards a particular aspect, instead of simply assigning sentiment polarity to the sentence.
Traditional research has utilized various neural networks with attention mechanisms to extract sentence representations [1,2,3]. However, the attention-based models only pay attention to the semantic information of a sentence, ignoring its syntactic dependence information. When the sentence contains multiple sentiment words with opposite polarities, the attention mechanism easily focuses on sentiment opinion words that are unrelated to aspect words. Taking the sentence in Figure 1 as an example, in terms of environment, the opinion word “good” may recieve more attention than the opinion word “poor”, but the opinion word “good” is related to another aspect of the sentence, namely, “food”.
The graph neural network (GNN) model is suitable for processing unstructured information. Using GNN on the syntactic dependency tree to solve the ABSC task usually has better results than traditional neural networks since the dependency tree can establish connections between related words. Considering Figure 1 as an example, there is a dependency between the aspect word “environment” and the opinion word “poor”. Zhang et al. [4] applied GCN to the ABSC task, using the dependency tree and attention mechanism for sentiment classification. Huang et al. [5] utilized the graph attention network and MHA to update the feature representations of nodes. Zhao et al. [6] proposed a GCN-based ABSC model, effectively capturing the sentiment dependencies among multiple aspects in a sentence.
Since the emergence of large-scale, pre-trained models, such as BERT [7] and RoBERTa [8], NLP tasks have generally begun to fine-tune from pre-trained models. E.g., Ranaldi [9] et al. compared BERT and interpretable tree-based approaches to study the syntactic knowledge of downstream tasks and demonstrated the effectiveness of the BERT-based model. However, researchers have found a gap between downstream tasks and pre-trained models. That is, when solving downstream tasks based on the pre-trained model, the pre-trained model is adapted to the downstream task. The prompt technology solves this problem. Some recent papers have used prompts attached to raw input text to guide language models to perform different tasks. One of the earliest examples is [10], which evaluated the efficiency of the GPT-2 model on downstream tasks by using prompts without any fine-tuning. Brown et al. [11] added prompts to sentences to make accurate predictions in the classification task, converting the task into a pre-training task, that is, a masked language model (MLM). Schick et al. [12] used prompts to achieve advanced results in text classification.
Based on the above analysis, to better adapt the pre-trained model to downstream tasks and make full use of the semantic and syntactic information of sentences, this paper proposes the PHNN model, which adds a prompt to adjust the input sequence and captures the sentiment of the aspect words through a hybrid neural network. This approach better extracts aspect words combined with contextual semantic and syntactic information. The validity of this model is verified on three benchmark datasets, and the contributions in this paper are concluded as follows:
  • This paper utilizes the prompt technology to convert the input into cloze-type text, making the downstream ABSC task more suitable for the pre-trained model;
  • This paper proposes an effective PHNN model, which utilizes RoBERTa to deal with the prompt inputs and then employs a hybrid neural network consisting of GCN, CNN, BiLSTM, and MHA to solve the ABSC task;
  • Extensive experiments are conducted, and the results demonstrate that PHNN performs best on SemEval 2014 and Twitter datasets compared with other baseline modules.
The rest of the paper is organized as follows: the related work is reviewed in Section 2, and the details of PHNN are introduced in Section 3. Experiments are conducted and analyzed in Section 4, and the paper concludes in Section 5.

2. Related Work

ABSC is a fine-grained subtask of aspect-based sentiment analysis (ABSA) that seeks to identify the sentiment polarity of a given aspect in a sentence. Classical methods mainly utilize CNN, recurrent neural networks (RNNs), and attention mechanisms to solve the ABSC task. Fan et al. [13] proposed the incorporation of attention in CNN to capture word expressions in sentences. Joshi et al. [14] applied CNN to extract features from text attention-based neural networks and model the semantic relations between sentence and aspect words. Xu et al. [15] proposed an MHA network to solve the ABSC problem when aspects contain multiple words. Zhang et al. [16] proposed an attention network that combined the two attention parts of a sentence to obtain better contextual representation.
In recent years, GNN has received much attention due to its ability to deal with unstructured content. Moreover, in ABSC tasks, GNN can handle syntactic dependency trees. Sun et al. [17] constructed a dependency tree model using BiLSTM to learn sentence feature representation and enhance sentence presentation through GCN. Wang et al. [18] pruned and reshaped the ordinary dependency tree and proposed a relational graph attention network to encode the new dependency tree.
With the development of language models, pre-trained models have achieved remarkable results on many NLP tasks, e.g., BERT and RoBERTa. In ABSA tasks, pre-trained models convert traditional static word vectors into dynamic word vectors with better dynamic semantic representations, effectively solving the sentiment analysis problem in long sentences and gradually becoming a standard model. Sun et al. [19] devised an aspect-based approach to solve the ABSA task by constructing auxiliary sentences and converting ABSA into a sentence-to-sentence classification problem. Yin et al. [20] proposed SentiBERT, a variant of BERT that can capture the sentiment features of a text more effectively. Alexandridis et al. [21] used BERT to perform emotion detection in social media text written in Greek. Sirisha et al. [22] combined RoBERTa and LSTM to analyze people’s emotions on the conflict between Ukraine and Russia through Twitter data. Although the pre-trained model is helpful in NLP tasks, it often suffers from the problem that it is less aware of the downstream task, and thus fails to exploit its full potential.
The prompt is a new fine-tuning paradigm inspired by GPT-3 [11], which has better semantic modeling for NLP tasks. The common practice for the prompt technology is to insert prompts with [mask] into the original input text and pre-train the model to predict words likely to occur at [mask] locations. Li et al. [23] first applied prompts to ABSA tasks, given known aspects and perspectives, constructing successive prompts to predict the corresponding sentiment categories. Gao et al. [24] dynamically selected cases relevant to each context to generate prompts to fine-tune the model automatically. Hu et al. [25] introduced knowledgeable prompt tuning to utilize external knowledge of sentences, thus improving the stability of prompt turning.
To solve the problem of inconsistent upstream and downstream ABSC tasks based on the pre-trained model, this paper designs input text based on the prompt, splices the original sentence, prompt text, and aspect words as the input of the pre-trained model, uses GCN combined with CNN to extract the syntactic information of the sentence, utilizes BiLSTM to obtain the semantic information of the sentence, and, finally, uses MHA to interact with the sentence and aspect words to further extract sentiment information.

3. Methodology

Suppose that, in a sentence,   X = { x 1 , x 2 , x t + 1 , x t + c , x n } , containing one or more aspect term A = { x t + 1 , x t + 2 ,   , x t + c } composed of c aspect words, c 1 ,   A   X . ABSC aims to predict the sentiment polarity of a particular aspect term in the given sentence. To solve the ABSC problem, we propose the PHNN model. The model’s architecture is shown in Figure 2. It consists of three layers: the prompt text construction layer, the syntactic and semantic encoding layer, and the sentiment classification layer. The details of the PHNN model are presented in the rest of this section.

3.1. Prompt Text Construction Layer

The main goal of the prompt text construction layer is to use the prompt mechanism to create prompt text. Adding prompt text helps the model to better understand the semantic relations between context and aspect words, thus aligning the upstream and downstream tasks. This method maximizes the power of MLM. The core of the prompt mechanism is to use the prompt text marked with [mask] to simulate the goal of the pre-trained model before training. Through this, we can transform the sentiment analysis task into a cloze task. In this paper, MLM is used to implement the cloze task. Different from BERT, CLS   is marked as s , and SEP is marked as / s . Adding a prompt to the input text can leverage the ability of the pre-trained model, improving its perception performance to downstream tasks. Figure 3 shows the process of the prompt text construction in this paper.
As shown in Figure 3, given a sentence X and an aspect term A , we change the original sentence X to X + P , and the prompt text P is defined as P = P l e f t + A + P r i g h t . More precisely, P l e f t is defined as “What is the sentiment about” and P r i g h t is defined as “? It was mask ”. E.g., if the original input sentence X = “poor restaurant environment but good food”, for the aspect word “food”, the final sentence constructed by the prompt template P is “ s poor restaurant environment but good food / s What is the sentiment about food? It was mask   / s ”. This paper uses RoBERTa and a sentence pair approach to generate the embedding vector representation of an input text, where the constructed prompt text O i n p u t s is combined with the aspect term O a s p e c t s to form sentence pairs. The details are as follows:
O i n p u t s = s + X + / s + P + / s
O a s p e c t s = s + A + / s  
where X is the original input sentence, s   is the unique identifier of each input sentence, / s is the identifier of the contextual sentence, P is the prompt text incorporating the aspect term, and   A is the aspect term.
The input text is transformed into word vectors using operations such as word separation and word embedding, and the <mask> tokens are predicted by using the MLM task in the pre-trained model. In ABSC tasks, pre-trained-based models such as BERT and RoBERTa are commonly used. RoBERTa is an improvement on the BERT model with three main optimizations. Firstly, RoBERTa adopts dynamic masking, which uses a new masking method for each new sequence input, making it more flexible than the fixed masking method in BERT. Secondly, RoBERTa removes the next sentence prediction task from BERT, which has little impact on performance. Finally, RoBERTa expands the batch size and word list, allowing the model to use a larger dataset during pre-training, resulting in richer semantic information at the end of pre-training. Like the BERT model, RoBERTa consists of multiple bi-directional transformer encoders, where the transformer encoder includes components such as self-attention, residual connectivity, and layer normalization.
Using the sentence pair O i n p u t s   and   O a s p e c t s as the input, the context hidden state vector W i n p u t s i = { w 1 i , w 2 i w n i } and the aspect vector W a s p e c t s a = { w 1 a , w 2 a w c a } are generated by RoBERTa for MLM and RoBERTa, respectively, where W i n p u t s i d i × n , W a s p e c t s a d a × c , d i , and d a are the word-embedding dimensions of RoBERTa for MLM and RoBERTa, respectively, and n and c are the lengths of the input sentences and aspect words, respectively. The formulas are shown as follows:
W i n p u t s i = RoBERTa MLM ( O i n p u t s )
W a s p e c t s a = RoBERTa ( O a s p e c t s )

3.2. Syntactic and Semantic Encoding Layer

GCN can be considered as an extension of traditional CNN to encode the local information of unstructured data. GCN combines hidden state vectors with dependency trees to construct a text graph and utilizes convolutional operations on the graph to obtain the syntactic features of aspect words. Moreover, GCN uses the information related to the node’s neighbor nodes to model multiple layers so that each node’s final hidden state can receive information from its more distant neighboring nodes. Given that a text has n words and each word is a node in the text graph, an adjacency matrix A i j R n × n can be obtained. For an L layer GCN,   l [ 1 , 2 , , L ] , let the output of the l -layer of a node i be g i l ; this can be calculated as shown in Equation (5):
g i l = σ ( j = 1 k A i j W l g i l 1 + b l )
where A i j denotes the syntactic structural adjacency matrix produced by the dependency tree parser, W l is the weight matrix of the l -layer, b l is the bias of the   l -layer, and σ is a non-linear activation function, such as ReLU.
The context hidden state vector W i n p u t s i generated by RoBERTa for MLM and the syntactic structural matrix A i j are fed into GCN, and the final output of GCN at the L layer is G L = { g 1 L , g 2 L , , g n 1 L , g n L } . The CNN layer in the PHNN model continues modeling the output of GCN, further extracting text features. Then, the output is fed into ReLU. Compared with the earlier sigmoid function, ReLU can speed up the convergence of the model training and can implement gradient descent and backpropagation more effectively and simultaneously, avoiding the problems of gradient explosion and gradient disappearance. The process of extracting features in CNN is shown in Equation (6):
c i = f ( W · G L + b )
where W h × m denotes the convolution kernel,   h × m is the convolution kernel size, b is the bias, and   f is the ReLU activation function.
The output of GCN is convolved to obtain the vector c i , which is sequentially spliced into the matrix C . After the CNN is connected to the maximum pooling layer, each convolutional kernel obtains the scalar C ^ = m a x { C } . In this paper, we use more than one convolutional kernel for feature extraction. After the maximum pooling layer, the features are concatenated to obtain the feature vector Z .
Z = [ C ^ 1 , C ^ 2 , , C ^ m ]
where m is the number of convolutional kernels.
BiLSTM is a special RNN that captures long-term dependencies in a sentence. In the PHNN model, the hidden state vector generated by RoBERTa for MLM is fed into BiLSTM, allowing the model to encode the input in both the forward and backward directions. BiLSTM consists of three gates: an input gate, an output gate, and a forgetting gate. These gate mechanisms allow the model to selectively remember or ignore information when processing input sequences, and thus allow the semantic and contextual relationships of the sentences to be better captured. Through the BiLSTM encoding process, the model can obtain a sentence representation that integrates forward and backward information, providing much richer semantic expressiveness for subsequent tasks. The specific BiLSTM unit computation process is shown in Equations (8)–(13):
i t = σ ( W i · [ h t 1 ; x t ] + b i )
f t = σ ( W f · [ h t 1 ; x t ] + b f )
o t = σ ( W o · [ h t 1 ; x t ] + b o )
g t = tanh ( W r · [ h t 1 ; x t ] + b r )
c t = i t g t + f t c t 1
h t = o t tanh ( c t )
where t denotes the time step,   x t is the input at t , x t W i n p u t s i , h t is the hidden vector representation at time step t , ∗ represents element multiplication, σ denotes the sigmoid activation function, W i and b i   are the parameters of the input gate,   W f and b f are the parameters of the forgetting gate, W o and b o are the parameters of the output gate, and c t 1 and c t denote the state of the previous cell and the state of the current cell, respectively. The hidden state vector W i n p u t s i generated by RoBERTa for MLM is passed through BiLSTM to obtain the vector H , where H is the final output of h t .
H = h t
After obtaining the outputs of the maximum pooling and BiLSTM, we use MHA to perform an interactive learning analysis of their outputs with aspect words, capturing possibly missed representations of sentiment features. MHA refers to performing multiple attention functions in parallel to calculate attention. The attention function maps a key sequence k = { k 1 , k 2 , k n } and a query sequence q = { q 1 , q 2 , q m } to the output sequence, as shown in Equation (15):
A t t e n t i o n ( k , q ) = s o f t m a x ( q k T d k ) k
where d k is the scale parameter.
MHA integrates single attention and projects it to a specified hidden dimension d h i d . The formula for calculating the MHA value M H A ( k , q ) is shown in Equations (16) and (17):
M H A ( k , q ) = C o n c a t ( A 1 : A 2 : : A r ) · W m h
A h = A t t e n t i o n h ( k , q )
where W m h d h i d × d h i d , A h is the output of the h -th head attention, h [ 1 , 2 , , r ] ,   r is the number of heads, and “ : “ denotes vector concatenation.
We obtain the output vector Z of the maximum pooling and the output vector H of the BiLSTM through the previous process and learn the vectors C c a and C l a after the MHA interacts with the aspect words’ vector W a s p e c t s a , as shown in Equations (18) and (19):
C c a = M H A ( Z , W a s p e c t s a )
C l a = M H A ( H , W a s p e c t s a )

3.3. Sentiment Classification Layer

The vectors C c a and C l a obtained from MHA are combined into   H f i n and then averaged to obtain H a v g . The averaged vectors are fed into the linear layer immediately following the softmax function to generate the sentiment polarity probability distribution y . The calculation process is shown in Equations (20)–(22):
H f i n = [ C c a : C l a ]
x = W a H a v g + b a
y = s o f t m a x ( x )
where W a and b a are the learnable parameter matrix and bias, respectively.

3.4. Training

Using a gradient descent algorithm, the model is trained using a cross-entropy loss and L2 regularization, as shown in Equation (23):
L o s s = i = 1 D j = 1 C y ^ i j log y i j + λ | | θ | | 2
where D is the size of the training set, C takes a value of 3, because the dataset includes positive, neutral, and negative labels, y i j is the predicted sentiment category of the text, y ^ i j is the true sentiment category of the text, λ | | θ | | 2 is the regularization term,   θ denotes all the trainable parameter sets, and λ denotes the L2 regularization coefficient.

4. Experiments

4.1. Datasets

Three datasets are used in the experiments, including the Laptop and Restaurant datasets [26] from SemEval 2014 Task 4 and the Twitter dataset [27]. The first two datasets can be downloaded from https://alt.qcri.org/semeval2014/task4/ (accessed on 15 August 2023). The last dataset can be downloaded from http://goo.gl/5Enpu7 (accessed on 15 August 2023). The Laptop dataset consists of over 3K instances from laptop reviewers. The Restaurant dataset consists of over 3K instances from the reviewers of restaurants. The Twitter dataset contains over 7K tweets about celebrities, products, and companies. Each instance of the above datasets consists of three lines: sentence, aspect words, and the polarity of the aspect words (1: positive, 0: neutral, −1: negative). Each dataset is originally divided into two parts: the train set and the test set. The details are shown in Table 1.

4.2. Experimental Setting

In the experiments, for RoBERTa, we use the RoBERTa-base version; the RoBERTa embedding dimension is 768, the RoBERTa for MLM embedding dimension is 50265, the learning rate is   2 × 10 5 , and the regularization coefficient is 1 × 10 4 . The number of layers of GCN is 2. In CNN, the number of convolutional kernels, the size of the convolution kernel, and the step size are 6, ( 6 ,   100 ) , ( 4 ,   55 ) , respectively. The maximum pooling window size is ( 2 ,   1 ) . The dimension of the hidden state vector output by BiLSTM and MHA is 300. The number of attention heads is 8 and the dropout is 0.1 in MHA. The Adam optimizer is used to update all parameters. The model is run on a GeForce RTX 2080 Ti GPU (NVDIA, Santa Clara, CA, USA).

4.3. Baseline Models

To verify the validity of the PHNN model, we compared it with the following models:
  • AOA [28]. It borrows the idea of attention over attention (AOA) to model aspects and sentences, learning the representation of aspect terms and contexts.
  • ATAE-LSTM [29]. It combines aspect and contextual word embeddings as the input, using LSTM and attention to process the hidden layer to obtain results.
  • TD-LSTM [30]. It uses two LSTM networks to model the text, extending the LSTM for ABSA tasks.
  • ASGCN [4]. It utilizes GCN to model the context, using syntactic information and interdependencies between words for ABSA tasks.
  • IAN [3]. It uses interactive attention to model the relations between context and aspect words, learning the representation of both for ABSA tasks.
  • BERT-SPC [31]. It changes the input of the BERT model to “[CLS] + context + [SEP] + aspect words + [SEP]” for sentence pair classification.
  • AEN-BERT [31]. It utilizes a pre-trained BERT model, an attention-based encoder, to obtain results.
  • R-GAT [18]. It reconstructs the dependency tree to remove redundant information, extending the original GNN to add a relational attention mechanism.
  • R-GAT+BERT [18]. An R-GAT model that is based on pre-trained BERT.
  • DualGCN [32]. It is a dual GCN model and utilizes orthogonal and differential regularizer methods to improve the ability of semantic correlations.
  • DualGCN+BERT [32]. A DualGCN model that is based on pre-trained BERT.
  • SSEGCN [33]. It is a syntactically and semantically enhanced GCN model for ABSA tasks that uses an aspect-aware attention mechanism with self-attention to obtain the attention score matrix of a sentence and enhanced node representations by executing GCN on the attention score matrix.

4.4. Main Results

We use accuracy and macro-averaged F1 values as measures of model performance. The experimental results are shown in Table 2 and the bold data in each column represents the optimal result. The results in Table 2 can be found in more detail in Appendix A.
We observe that PHNN achieves the best performance. It achieves higher accuracy than the best baseline by 2.15, 1.59, and 0.67 on the Restaurant, Laptop, and Twitter datasets, respectively. Additionally, its F1 score is also higher than the best baselines of these datasets by 2.3, 1.49, and 0.76, respectively.
We also see that the pre-trained-based models usually perform better than other non-pre-trained-based models. This is because pre-trained models are trained on a large amount of unlabeled data, enabling them to learn a general representation of language that can be better adapted to various downstream tasks. Moreover, compared with R-GAT, DualGCN, and other syntax-based models using GNN, PHNN performs better because it utilizes semantic information through BiLSTM while using the prompt to adjust the input sequence, which can better stimulate the ability of the pre-trained model. Then, compared with attention-based methods such as ATAE-LSTM and IAN, PHNN runs better because it utilizes the syntactic structure knowledge to establish the dependencies between words, avoiding the noise brought by the attention mechanism. Finally, methods based on syntactic knowledge such as ASGCN and R-GAT achieve better classification results than attention-based methods such as AOA, but these models ignore semantic information, causing a poorer performance than PHNN.

4.5. Ablation Study

To evaluate each component’s impact on the PHNN mode’s overall performance, an ablation study was conducted, and the results of this are shown in Table 3. The bold data in each column represents the optimal result.
As can be seen from the table, the removal of any module leads to a decrease in the model’s performance. E.g., when removing the prompt template, the accuracy and F1 scores of the model are dropped by (0.98, 0.38), (2.98, 2.69), and (1.19, 2.04) on the three datasets, respectively, proving that adjusting the input sequence with the template improves the classification performance. We also see that the removal of GCN on the Restaurant dataset has a greater impact on the performance of the model compared to removing other modules, which is similar to the results of the removal of the prompt on the Laptop dataset and the removal of BiLSTM on the Twitter dataset. E.g., in the Restaurant dataset, the removal of GCN results in lower accuracy and F1 scores than the removal of the prompt by 2.28 and 4.16, respectively. This is because GCN can better utilize the syntactic structure information of sentences.

4.6. Case Study

To further explore the ABSC effects of different models, eight aspect words of the four examples are collected from the test set. AEN_BERT, BERT_SPC, ATAE_LSTM, and ASGCN are compared with PHNN for analysis. The results are shown in Table 4. The symbols P, O, and N represent positive, neutral, and negative sentiment, respectively. The symbols “√” and “×” indicate whether or not the model correctly predicted the sentiment polarity of the aspect. Besides, we bold the results of our PHNN model to clearly display the prediction results.
The first sentence has one aspect and the second and the third sentences have two aspects with opposite sentiment polarity, which are more likely to interfere with attention models. From the first three example sentences, the method using the BERT pre-trained model has better classification results than the other. Our PHNN has correct predictions for all three samples, and the results show that PHNN effectively combines syntactic and semantic information; the addition of the prompt and hybrid neural network for syntactic analysis improved the classification results.
We also see a failure case in the last sentence about the aspect words “price tag”. This is because long sentences themselves contain a lot of information, and adding the prompt template for longer sentences may increase the burden of capturing long-distance dependencies, affecting the classification effect.

5. Discussion and Conclusions

ABSC is a well-studied NLP task, and pre-trained models and neural networks are frequently used in ABSC tasks. In response to the downstream task that cannot fully stimulate the ability of the pre-trained model and the attention mechanism that usually neglects the syntactic information of sentences, resulting in information loss and unsatisfactory results, this paper proposes the PHNN model, which utilizes a prompt and hybrid neural network to solve the ABSC task. PHNN contains three main layers: the prompt text construction layer, the syntactic and semantic encoding layer, and the sentiment classification layer. In the prompt text construction layer, we use the prompt to reform the sentence and then input it into the RoBERTa pre-trained model. The prompt knowledge guides the pre-trained model to narrow the gap between the downstream task and the pre-trained model. In the syntactic and semantic encoding layer, we consider both syntactic dependency information and semantic information between contextual sentences. More precisely, we use GCN combined with CNN to extract syntactic features, and utilize BiLSTM to obtain semantic features. Then, we utilize MHA to capture possibly missed representations of sentiment features. In the sentiment classification layer, we obtain the sentiment polarity of the sentence by using the softmax function. Our experiments demonstrate the efficiency of PHNN for the ABSC task.
Our future plan is to investigate other deep learning techniques to further enhance the performance of the proposed model. Additionally, we intend to evaluate our proposed model in other ABSA tasks to verify its effectiveness in addressing sentiment-related issues.

Author Contributions

Conceptualization, W.Z.; methodology, W.Z. and J.L.; software, J.L. and Y.M.; validation, J.L., Y.M. and P.L.; formal analysis, W.Z. and J.L.; investigation, J.L.; writing—review and editing, W.Z., J.L. and Y.M.; funding acquisition, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Youth Science and Technology Innovation Personnel Training Project of Heilongjiang (Grant No. UNPYSCT-2020072), Fundamental Research Funds for the Universities of Heilongjiang (Grant No. 145109217), and Education Science Fourteenth Five-Year Plan 2021 Project of Heilongjiang (Grant No. GJB1421344).

Data Availability Statement

We have used a publicly available dataset; the link is given in Section 4.1.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Accuracy and F1 are commonly used evaluation metrics for characterizing the quality of the model. We calculate accuracy according to true positive (TP), false positive (FP), true negative (TN), and false negative (FN). F1 is calculated based on precision and recall metrics, where precision is the percentage of instances labeled positive that are positive, and recall measures the model’s ability to correctly identify all relevant instances (true positives) within the total number of actual positive instances. The formal formulas of these metrics are shown in Equations (A1)–(A4).
Accuracy = TP + TN TP + FP + TN + FN
Precision = TP TP + FP
Recall = TP TP + FN
F 1 = 2 Precision Recall Precision + Recall
On this basis, Table A1 shows more details about Table 2. We report the metrics of the precision, recall, and F1-score of each class in the three benchmark datasets, respectively. To clearly display the results, we dispaly the best value for each class in bold in each column. From the table, we see that our PHNN model usually outperforms other models in identifying positive and negative classes. For the neutral class, although our model typically does not perform best, its performance is close to that of the best model. These further demonstrate the superior performance of our PHNN model.
Table A1. More details about Table 2.
Table A1. More details about Table 2.
ModelRestaurantLaptopTwitter
PrecisionRecallF1-ScorePrecisionRecallF1-ScorePrecisionRecallF1-Score
Our PHNNNegative0.86570.88780.87660.69090.89060.77820.73680.72830.7326
Netural0.83590.54590.66050.74170.66270.70000.76840.81500.7910
Positive0.89760.97530.93480.92860.87680.90200.78570.69940.7401
AOANegative0.67290.73470.70240.51740.81250.63220.59900.69940.6453
Netural0.65910.29590.40850.64650.37870.47760.69790.77460.7342
Positive0.83010.93270.87840.84320.83580.83950.75470.46240.5735
ATAE-LSTMNegative0.67420.61220.64170.45090.60940.51830.64880.63010.6393
Netural0.64550.36220.46410.56070.35500.43480.70460.75140.7273
Positive0.81730.93410.87180.77090.80940.78970.62580.56070.5915
TD-LSTMNegative0.69850.70920.70380.46150.60940.52530.71710.63010.6708
Netural0.66670.31630.42910.57260.39640.46850.72380.75720.7401
Positive0.82130.93410.87400.82670.85340.83980.65730.67630.6667
ASGCNNegative0.65750.73470.69400.54660.68750.60900.68480.65320.6686
Netural0.65250.39290.49040.71290.42600.53330.71180.82080.7624
Positive0.85820.92310.88950.79790.87980.83680.71090.52600.6047
IANNegative0.68750.67350.68040.48770.77340.59820.65410.69940.6760
Netural0.69350.21940.33330.63510.27810.38680.71140.71970.7155
Positive0.79450.94510.86320.79220.83870.81480.61780.56070.5879
R-GATNegative0.75390.73470.74420.60690.68750.64470.69360.69360.6936
Netural0.60490.50000.54750.69720.44970.54680.76200.73120.7463
Positive0.87220.91900.89500.79170.89150.83860.63640.68790.6611
DualGCNNegative0.79060.77040.78040.61040.73440.66670.74210.69820.7195
Netural0.60310.59690.60000.70830.61080.65590.72940.84230.7818
Positive0.89510.90370.89940.85930.85160.85540.74620.56400.6424
SSEGCNNegative0.77010.73470.75200.64380.73440.68610.71350.72190.7176
Netural0.76740.50510.60920.70060.65870.67900.77250.81850.7948
Positive0.86430.95460.90720.88150.86050.87090.74670.65120.6957
BERT-SPCNegative0.82870.76530.79580.63530.84380.72480.71120.76880.7389
Netural0.68250.65820.67010.68460.60360.64150.79150.75720.7740
Positive0.91070.93820.92420.91850.85920.88790.71840.72250.7205
AEN-BERTNegative0.72250.77040.74570.76530.58590.66370.69840.76300.7293
Netural0.67030.31120.42510.66670.59170.62700.74860.80060.7737
Positive0.85370.96150.90440.81030.92670.86460.75940.58380.6601
R-GAT+
BERT
Negative0.77250.83160.80100.63860.82810.72110.71510.73990.7273
Netural0.69110.67350.68220.70250.65680.67890.78630.79770.7920
Positive0.92620.91350.91980.91720.84460.87940.73460.68790.7104
DualGCN+
BERT
Negative0.81680.79590.80620.74620.75780.75190.73960.73960.7396
Netural0.76160.58670.66280.66300.73050.69520.75900.81550.7862
Positive0.89450.95600.92420.91510.86350.88850.77550.66280.7147

References

  1. Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical Attention Networks for Document Classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar]
  2. Yadav, R.K.; Jiao, L.; Goodwin, M.; Granmo, O.-C. Positionless aspect based sentiment analysis using attention mechanism. Knowl. Based Syst. 2021, 226, 107136. [Google Scholar] [CrossRef]
  3. Ma, D.; Li, S.; Zhang, X.; Wang, H. Interactive Attention Networks for Aspect-Level Sentiment Classification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 4068–4074. [Google Scholar]
  4. Zhang, C.; Li, Q.; Song, D. Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 4567–4577. [Google Scholar]
  5. Huang, B.; Carley, K. Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5468–5476. [Google Scholar]
  6. Zhao, P.; Hou, L.; Wu, O. Modeling sentiment dependencies with graph convolutional networks for aspect-level sentiment classification. Knowl. Based Syst. 2020, 193, 105443. [Google Scholar] [CrossRef]
  7. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar]
  8. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
  9. Ranaldi, L.; Pucci, G. Knowing Knowledge: Epistemological Study of Knowledge in Transformers. Appl. Sci. 2023, 13, 677. [Google Scholar] [CrossRef]
  10. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
  11. Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; pp. 1877–1901. [Google Scholar]
  12. Schick, T.; Schütze, H. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online, 19–23 April 2021; pp. 255–269. [Google Scholar]
  13. Fan, C.; Gao, Q.; Du, J.; Gui, L.; Xu, R.; Wong, K.-F. Convolution-based Memory Network for Aspect-based Sentiment Analysis. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 1161–1164. [Google Scholar]
  14. Joshi, A.; Prabhu, A.; Shrivastava, M.; Varma, V. Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 11–16 December 2016; pp. 2482–2491. [Google Scholar]
  15. Xu, Q.; Zhu, L.; Dai, T.; Yan, C. Aspect-based sentiment classification with multi-attention network. Neurocomputing 2020, 388, 135–143. [Google Scholar] [CrossRef]
  16. Zhang, B.; Xiong, D.; Su, J.; Zhang, M. Learning better discourse representation for implicit discourse relation recognition via attention networks. Neurocomputing 2018, 275, 1241–1249. [Google Scholar] [CrossRef]
  17. Sun, K.; Zhang, R.; Mensah, S.; Mao, Y.; Liu, X. Aspect-Level Sentiment Analysis Via Convolution over Dependency Tree. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5678–5687. [Google Scholar]
  18. Wang, K.; Shen, W.; Yang, Y.; Quan, X.; Wang, R. Relational Graph Attention Network for Aspect-based Sentiment Analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 3229–3238. [Google Scholar]
  19. Sun, C.; Huang, L.; Qiu, X. Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 380–385. [Google Scholar]
  20. Yin, D.; Meng, T.; Chang, K.-W. SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 3695–3706. [Google Scholar]
  21. Alexandridis, G.; Korovesis, K.; Varlamis, I.; Tsantilas, P.; Caridakis, G. Emotion detection on Greek social media using Bidirectional Encoder Representations from Transformers. In Proceedings of the 25th Pan-Hellenic Conference on Informatics, Volos, Greece, 26–28 November 2021; pp. 28–32. [Google Scholar]
  22. Sirisha, U.; Chandana, B.S. Aspect based Sentiment & Emotion Analysis with ROBERTa, LSTM. Int. J. Adv. Comput. Sci. Appl. 2022, 11, 7. [Google Scholar] [CrossRef]
  23. Li, C.; Gao, F.; Bu, J.; Xu, L.; Chen, X.; Gu, Y.; Shao, Z.; Zheng, Q.; Zhang, N.; Wang, Y.; et al. SentiPrompt: Sentiment Knowledge Enhanced Prompt-Tuning for Aspect-Based Sentiment Analysis. arXiv 2021, arXiv:2109.08306. [Google Scholar]
  24. Gao, T.; Fisch, A.; Chen, D. Making Pre-trained Language Models Better Few-shot Learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; pp. 3816–3830. [Google Scholar]
  25. Hu, S.; Ding, N.; Wang, H.; Liu, Z.; Wang, J.; Li, J.; Wu, W.; Sun, M. Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 2225–2240. [Google Scholar]
  26. Pontiki, M.; Galanis, D.; Pavlopoulos, J.; Papageorgiou, H.; Androutsopoulos, I.; Manandhar, S. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, 23–24 August 2014; pp. 27–35. [Google Scholar]
  27. Dong, L.; Wei, F.; Tan, C.; Tang, D.; Zhou, M.; Xu, K. Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, MA, USA, 22–27 June 2014; pp. 49–54. [Google Scholar]
  28. Huang, B.; Ou, Y.; Carley, K.M. Aspect Level Sentiment Classification with Attention-over-Attention Neural Networks. In Proceedings of the 2018 Conference on Social, Cultural, and Behavioral Modeling; Lecture Notes in Computer Science, Washington, DC, USA, 10–13 July 2018; pp. 197–206. [Google Scholar]
  29. Wang, Y.; Huang, M.; Zhu, X.; Zhao, L. Attention-based LSTM for Aspect-level Sentiment Classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 606–615. [Google Scholar]
  30. Tang, D.; Qin, B.; Feng, X.; Liu, T. Effective LSTMs for Target-Dependent Sentiment Classification. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 11–16 December 2016; pp. 3298–3307. [Google Scholar]
  31. Song, Y.; Wang, J.; Jiang, T.; Liu, Z.; Rao, Y. Attentional Encoder Network for Targeted Sentiment Classification. arXiv 2019, arXiv:1902.09314. [Google Scholar]
  32. Li, R.; Chen, H.; Feng, F.; Ma, Z.; Wang, X.; Hovy, E. Dual Graph Convolutional Networks for Aspect-based Sentiment Analysis. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; pp. 6319–6329. [Google Scholar]
  33. Zhang, Z.; Zhou, Z.; Wang, Y. SSEGCN: Syntactic and Semantic Enhanced Graph Convolutional Network for Aspect-based Sentiment Analysis. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–16 July 2022; pp. 4916–4925. [Google Scholar]
Figure 1. A sentence with its syntactic dependency tree.
Figure 1. A sentence with its syntactic dependency tree.
Electronics 12 04126 g001
Figure 2. The overall architecture of the PHNN model.
Figure 2. The overall architecture of the PHNN model.
Electronics 12 04126 g002
Figure 3. The prompt text construction.
Figure 3. The prompt text construction.
Electronics 12 04126 g003
Table 1. Dataset information.
Table 1. Dataset information.
DatasetPositiveNeutralNegative
TrainTestTrainTestTrainTest
Twitter156117331273461560173
Restaurant2164728637196807196
Laptop994341464169870128
Table 2. Comparison of accuracy and macro-F1 on three datasets.
Table 2. Comparison of accuracy and macro-F1 on three datasets.
ModelRestaurantLaptopTwitter
AccuracyMacro-F1AccuracyMacro-F1AccuracyMacro-F1
AOA78.6666.3171.0064.9867.7765.10
ATAE-LSTM77.7765.9264.8958.0967.3465.27
TD-LSTM78.6666.9068.3461.1270.5269.25
ASGCN79.7369.1372.1065.9770.5267.86
IAN77.0562.5767.7159.9967.4965.98
R-GAT81.3472.8973.3567.6671.1070.03
DualGCN82.6675.9976.4272.6073.5671.46
SSEGCN83.7475.6178.1674.5375.1873.60
BERT-SPC85.8979.6778.8475.1475.1474.45
AEN-BERT81.4369.1776.9671.8473.7072.11
R-GAT+BERT85.7180.1079.3174.6875.5874.32
DualGCN+BERT86.3379.7780.7077.8575.7874.69
Our PHNN88.4882.4082.2979.3476.4575.45
Table 3. Ablation study of the PHNN model, where w/o means some removed components.
Table 3. Ablation study of the PHNN model, where w/o means some removed components.
ModelRestaurantLaptopTwitter
AccuracyMacro-F1AccuracyMacro-F1AccuracyMacro-F1
Our PHNN88.4882.4082.2979.3476.4575.45
w / o prompt87.5082.0279.3176.6575.2673.41
w / o GCN85.2277.8680.7278.0473.9973.09
w / o CNN86.2380.5481.3576.9673.5572.57
w / o BiLSTM86.2581.6281.5077.9873.2872.42
Table 4. Case analysis of PHNN compared with state-of-the-art baselines.
Table 4. Case analysis of PHNN compared with state-of-the-art baselines.
SentenceAspect WordsAEN_BERTBERT_SPCATAE_LSTMASGCNOur PHNNTrue
Label
The portions of the food that came out were mediocre.portions of the food O O N × N × O O
The falafel was rather over cooked and dried but the chicken was fine.falafel N N P × N N N
chicken N × O × N × N × P P
Great food but the service was dreadful!food N × P P P P P
service N N N P × N N
Other than not being a fan of click pads (industry standard these days) and the lousy internal speakers, it’s hard for me to find things about this notebook I don’t like, especially considering the $350 price tag.click pads P × N N N N N
price tag P N × N × N × N × P
internal speakers P × N N N N N
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, W.; Luo, J.; Miao, Y.; Liu, P. PHNN: A Prompt and Hybrid Neural Network-Based Model for Aspect-Based Sentiment Classification. Electronics 2023, 12, 4126. https://doi.org/10.3390/electronics12194126

AMA Style

Zhu W, Luo J, Miao Y, Liu P. PHNN: A Prompt and Hybrid Neural Network-Based Model for Aspect-Based Sentiment Classification. Electronics. 2023; 12(19):4126. https://doi.org/10.3390/electronics12194126

Chicago/Turabian Style

Zhu, Wenlong, Jiahao Luo, Yu Miao, and Peilun Liu. 2023. "PHNN: A Prompt and Hybrid Neural Network-Based Model for Aspect-Based Sentiment Classification" Electronics 12, no. 19: 4126. https://doi.org/10.3390/electronics12194126

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop