1. Introduction
Aspect-based sentiment analysis (ABSA) is a fine-grained task in sentiment analysis that aims to identify aspect terms (a), their corresponding opinion terms (o), and the sentiment polarity (s) expressed in a sentence, as illustrated in
Figure 1.
As coarse-grained sentiment analysis is no longer suitable for people’s evolving needs, aspect-based sentiment analysis has garnered increasing attention in recent years. ABSA consists of several subtasks, namely Aspect Term Extraction (AE), Opinion Term Extraction (OE), and Aspect Sentiment Classification (ASC). These subtasks are strongly correlated with each other and are often combined to create a variety of compound tasks. For instance, Zhao et al. [
1] proposed the Aspect Opinion Pair Extraction (AOPE) task, which aims to extract aspect terms and their corresponding opinion terms in pairs. Similarly, Peng et al. [
2] proposed the Aspect Sentiment Triple Extraction (ASTE) task, which involves extracting both aspect terms and corresponding opinion terms from sentences and determining the aspect sentiment polarity.
Previous research on ABSA has primarily focused on single-task forms, where AE, OE, and ASC can only produce results for one specific type of task. Regarding the AE task, most studies treat it as a sequence labeling problem [
3,
4]. The OE task is typically considered an auxiliary task to the AE task. Since opinions and aspects often co-occur in a sentence, it is illogical to extract opinions alone without considering the corresponding aspects. The ASC task is a sentiment classification problem that is based on aspect terms [
5,
6]. In this task, sentiment judgments are made by considering the aspect terms and their contexts.
Most studies now utilize either an end-to-end approach or a pipeline approach to solve multiple tasks simultaneously. Taking end-to-end ABSA as an example, it aims to predict both the aspect terms mentioned in the text and their corresponding sentiment polarity.
There are several issues with existing methods for aspect-based sentiment analysis, including the following:
C1: The absence of interaction between different target labels often leads to reduced accuracy in compound tasks. In general, the majority of tasks rely on sequence labels for term extraction and utilize specific classification networks for sentiment classification. However, these methods ignore the correlation between different tasks. During training, the rich semantic information conveyed by different labels is often neglected, and the interaction between labels is not taken into account. For instance, when given the sentence “The service is rude, but the burgers are delicious”, it is easier to identify the sentiment of the aspect term “service” as “negative” if we consider its corresponding opinion term “rude”. If the model understands that “delicious” is an adjective used to describe food, it can easily identify the aspect-opinion pair “(burgers, delicious)”.
C2: Differences in training methods between pre-trained models and downstream tasks often make it challenging to leverage the knowledge of pre-trained models. This can result in forgetting of prior knowledge, making it difficult to recognize multiple sentiment elements. The pre-trained model uses a natural language form for self-supervised learning during training, which is different from the training form for downstream tasks. This makes it difficult to take full advantage of the knowledge already acquired by the model during pre-training. As an example, Zhang et al. [
7] proposed to use a generative model for aspect-based sentiment analysis, where the extraction style was targeted in the ASTE task in the form of (aspect, opinion, sentiment polarity), which was different from the learning form in the pre-training stage of the model and produced errors.
C3: It is difficult to achieve good results with low resources, making it inefficient under restricted or limited datasets. To achieve satisfactory results, existing ABSA methods typically require at least several thousand data to fine-tune the model. In some areas, the lack of large annotated datasets and the high cost of manual data annotation pose challenges to using traditional ABSA methods. Therefore, there has been a growing focus on how to achieve good results with a small amount of data.
Inspired by the recent success of transforming various NLP tasks into text generation problems [
8,
9,
10], we use a unified generation framework to solve various ABSA tasks without the need for model network structure design for specific tasks. We transform the ABSA task into a text generation problem to enhance the interaction of different target elements, which can avoid error propagation in compound tasks. Due to the variety of ABSA tasks, we focus on aspect sentiment classification (ASC), aspect term extraction and sentiment classification (AESC), and aspect sentiment triplet extraction (ASTE).
In this paper, our main contributions are as follows:
- (1)
We transform the ABSA task into a unified generative task, eliminating the need to design specific network structures for each task. The connection between sentiment elements is strengthened in the compound task. For example, the F1 scores for the AESC and ASTE tasks are improved by 4.86% and 10.17%, respectively, on the Laptop 14 dataset.
- (2)
We propose three instruction prompts to redesign the input and output formats of the model. This approach enables downstream tasks to adapt to the pre-trained model, reduces the errors between the pre-trained model and the downstream tasks, and improves the utilization of the pre-trained model.
- (3)
We use the instruction prompt method for few-shot learning on ABSA tasks. Experimental results show that using only 10% of the original data can achieve 80% of the model performance compared to fully supervised learning. This can reduce annotation costs while obtaining good models.
- (4)
We conducted experiments on three ABSA tasks using both fully supervised and few-shot learning approaches on four benchmark datasets. Our proposed approach outperformed the state-of-the-art in nearly all cases, demonstrating its effectiveness in improving ABSA performance. For example, our approach demonstrates improvements over the state-of-the-art in almost all cases when conducting fully supervised learning and few-shot learning experiments on three ABSA tasks across four benchmark datasets. On the Laptop 14 dataset, our fully supervised learning approach results in a 1.46% increase in the F1 score in the ASTE task, while our few-shot learning approach achieves an F1 score of 50.62%, which is 82% of the performance of the fully supervised learning model.
2. Related Work
Sentence-level sentiment analysis mainly determines the sentiment expressed in the entire text, and there is currently some research [
11,
12,
13] in this area. However, coarse-grained sentiment analysis can no longer meet people’s needs, so there is a growing exploration of aspect-based sentiment analysis methods to extract finer-grained sentiment information. Early research work on aspect-based sentiment analysis could only identify single sentiment elements. For example, the aspect term extraction task aims to extract the aspect terms mentioned in a sentence [
14,
15]. The aspect sentiment classification task aims to predict the sentiment polarity of aspects in a sentence [
16]. These tasks are all single tasks. A single sentiment element is insufficient to comprehend the complete aspect-level perspective. This requires not only extracting multiple sentiment elements but also understanding the correspondences and dependencies between the sentiment elements. In recent years, most studies have focused on the ABSA compound task, which acquires multiple sentiment elements simultaneously. For example, the Aspect Sentiment Triplet Extraction (ASTE) task requires the extraction of aspect terms, corresponding opinion terms and aspect sentiment, such as the (atmosphere, nice, positive) triplet extracted from the example in
Figure 1.
To improve the performance of ABSA tasks, it has become common to use pre-trained models, which have been shown to achieve better results. The modeling paradigms commonly used in ABSA tasks include Sequence-level Classification (SeqClass), Token-level Classification (TokenClass), Machine Reading Comprehension (MRC), and Sequence-to-Sequence modeling (Seq2Seq), as shown in
Figure 2. Each modeling paradigm serves as a generic framework for handling ABSA tasks, on which the researchers can design modifications to achieve better results. In addition, the complex nature of ABSA tasks often requires the use of the pipeline paradigm, where multiple models are combined to make the final prediction.
In recent years, the study of ABSA tasks has slowly shifted from single tasks to compound tasks. Compound tasks are relatively complex. ABSA compound tasks are usually solved using the pipeline method [
2,
17,
18,
19] or end-to-end method [
20,
21,
22,
23]. The pipeline approach is relatively easy to implement, requiring only the sequential connection of solutions for each subtask to obtain the final result. However, it suffers from the error propagation problem, i.e., errors generated by the earlier models propagate to the later models and affect the overall output results. The end-to-end method of solving ABSA tasks typically requires a deeper understanding of the tasks and models involved, as well as specific design of input and output formats. This can make it a more challenging approach compared to the pipeline method.
ABSA has correlation between different tasks. Existing work designs a variety of task-specific classification networks, making it difficult for models to adapt to each other with potential errors. Zhang et al. [
7] transformed the ABSA task into a text generation problem and proposed two paradigms, namely annotation-style and extraction-style modeling, to solve Aspect Opinion Pair Extraction (AOPE), Unified ABSA (UABSA), Aspect Sentiment Triplet Extraction (ASTE), and Target Aspect Sentiment Detection (TASD) tasks with good results. Different ABSA tasks require different models and it is challenging to solve multiple tasks simultaneously with a single model. Yan et al. [
10] proposed a unified generative framework that uses the BART generative model and pointer networks to solve the seven ABSA tasks. Due to the strong correlation between different tasks, Zhang et al. [
24] proposed the Paraphrase modeling paradigm to generate sentiment elements in the form of natural language, which can fully explore the essence of sentiment elements.
The above methods require a large amount of labeled data for training. However, there are not enough labeled datasets in some fields. Few-shot learning or zero-shot learning is the ideal solution for completing ABSA tasks in low-resource scenarios. Since the advent of GPT-3 [
25], prompt-tuning has received greater attention and a series of research works have emerged. Compared with standard fine-tuning, prompt-tuning can effectively stimulate the knowledge in the pre-trained model and achieve better performance in few-shot learning. Liu et al. [
26] first summarized the research on prompt learning, which led to a great promotion of prompt learning. Unlike traditional learning methods, prompt learning is based on language models that directly model text probabilities, and adapts to new scenarios with zero-shot or few-shot data by defining prompt templates that guide model output and tap into model latent knowledge. The purpose of model fine-tuning is to adapt the model to the requirements of the downstream task in a way that differs from the model pre-training stage. The difference is that prompt learning aims to adapt the downstream task to the model to bridge the gap between pre-training and fine-tuning. Recent research in Pattern-Exploiting Training (PET) [
27] has explored the use of semi-supervised learning, where the input data are transformed into a completion format to assist language models in understanding the given task. PET outperforms both supervised training and strong semi-supervised training methods in low-resource environments. The findings indicate that prompt learning can result in improved performance for pre-trained models in various NLP tasks, including text classification [
28], relation classification [
29], named entity recognition (NER) [
30], and so on.
Prompt learning has demonstrated superior performance in NLP tasks, and we aim to apply it to ABSA tasks. To explore the potential benefits of prompt learning on the complex ABSA task, we apply this technique to both single and compound ABSA tasks and evaluate its effectiveness in few-shot learning scenarios with limited labeled data.
3. Methodology
Aspect-based sentiment analysis (ABSA) aims to identify aspect terms, corresponding opinion terms, and aspect sentiment polarity from the given text. This is demonstrated in
Figure 1.
Figure 3.
Overview of the ABSA generation framework using the ASTE task as an example.
Figure 3.
Overview of the ABSA generation framework using the ASTE task as an example.
Given a text T,the ABSA task obtains , which correspond to aspect term, opinion term, and sentiment polarity, respectively. The aspect term a and the opinion term o exist in the sentence T, where and . contains all the words that appear in T. Sentiment polarity s takes values in the set of sentiment categories {positive, negative, neutral}. In the study, the verbalizer is used to convert the sentiment categories into words that are represented in the output, i.e., {‘positive’: ‘great’, ‘negative’: ‘bad’, ‘neutral’: ‘ok’}. Our goal is to use the generative model M to obtain the final target y. y contains all the desired ABSA elements, i.e., .
In this paper, we train the generative model using fully supervised learning and few-shot learning in the form of instruction prompts to obtain the three sentiment elements of aspect terms, opinion terms, and aspect sentiment.
3.1. Input Transformation
To reduce the errors in pre-training models and downstream task fine-tuning, we propose three instruction prompt templates, named IPT-a, IPT-b, and IPT-c, respectively. Instruction prompts in ABSA can be considered as providing the model with explicit cue prompts, similar to a question-answer format. The prompts guide the model to focus on specific aspects of the input text, such as identifying aspect terms or extracting aspect sentiment, which can be learned through a process of question-answering. By providing these prompts, the model is encouraged to learn the relevant ABSA elements more effectively. We transform the raw text data into the format of instruction prompts. The instruction prompt template reformats the input and output formats, as shown in
Figure 3. Examples of the conversions for each task, including ASC, AESC, and ASTE, can be found in
Table 1,
Table 2 and
Table 3, respectively.
Aspect Sentiment Classification (ASC) is a task that aims to classify the sentiment polarity of a given aspect term [
31,
32].
Table 1 illustrates examples of ASC tasks under three different instruction prompt templates.
Table 1 illustrates that the input and output formats for the model are natural language text, which is similar to the pre-training stage of the model. This reduces the discrepancy between the pre-training stage and the fine-tuning stage. Since the output template is fixed, the model needs to learn the form of the output distribution, which places higher demands on the model. We format the output to obtain the desired result.
Aspect Term Extraction and Sentiment Classification (AESC) is a task in which both the aspect terms and their corresponding sentiment polarities are extracted from the given text [
33,
34,
35].
Table 2 presents examples of AESC tasks formatted using three instruction prompt templates.
Aspect Sentiment Triplet Extraction (ASTE) is a complex task to extract sentiment triples (a, o, s) from text [
7,
10].
Table 3 shows examples of ASTE tasks under three instruction prompt templates.
Under the three prompt templates, when the text contains multiple targets, the output is separated by a special token [SSEP], similar to the approach proposed by Zhang et al. [
24]. For example, given the text “The pizza is yummy and I like the atmoshpere.”, the output of the ASC task is “The pizza is great [SSEP] The atmoshpere is great”. The output format of other tasks is also similar.
3.2. Sequence-to-Sequence Learning
Given the input text
T, we use the generative model
to generate the target sequence
, where
is similar to the “Example—Input” column in
Table 1,
Table 2 and
Table 3. Then, the corresponding sentiment elements are extracted from
based on the task requirements, as shown in
Figure 3.
We choose the pre-trained T5 model [
36], which uses the encoder–decoder architecture of the original transformer. Given an input text
T, the encoder first transforms it into a contextual encoding sequence
e. Then, the decoder performs conditional probability modeling
according to e to obtain the target text
, where
is the parameter.
At each time step, the output of the decoder at the current time is obtained from the output of the decoder at the previous time step and the output
e of the encoder. Next, the probability distribution of the next token is obtained using softmax.
where
is the output of the decoder at the i-th time step;
is the decoding calculation;
W is the trainable parameter.
In the training stage, we initialize with the pre-trained T5 model and then further fine-tune the model parameters to maximize .
In the inference stage, we use greedy search as the decoding strategy, and the token with the highest probability from the word list is selected as the next token at each time step. Finally, the output is parsed to obtain the target sentiment element. Specifically, we first determine whether there are multiple results based on the special token ‘[SSEP]’ and then parse the sentiment element for each result. If the output does not conform to the predefined output format, the sentiment element will not be fetched, representing a decoding failure, indicated by ’None’.