Next Article in Journal
Resilient Electricity Load Forecasting Network with Collective Intelligence Predictor for Smart Cities
Next Article in Special Issue
Multi-Modal Sentiment Analysis Based on Image and Text Fusion Based on Cross-Attention Mechanism
Previous Article in Journal
A Lightweight Context-Aware Feature Transformer Network for Human Pose Estimation
Previous Article in Special Issue
Recommendations for Responding to System Security Incidents Using Knowledge Graph Embedding
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Individual- vs. Multiple-Objective Strategies for Targeted Sentiment Analysis in Finances Using the Spanish MTSA 2023 Corpus

by
Ronghao Pan
,
José Antonio García-Díaz
* and
Rafael Valencia-García
Facultad de Informática, Universidad de Murcia, Campus de Espinardo, 30100 Murcia, Spain
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(4), 717; https://doi.org/10.3390/electronics13040717
Submission received: 15 November 2023 / Revised: 7 February 2024 / Accepted: 8 February 2024 / Published: 9 February 2024
(This article belongs to the Special Issue Application of Data Mining in Social Media)

Abstract

:
Multitarget sentiment analysis extracts the subjective polarity of text from multiple targets simultaneously in a given context. This approach is useful in finance, where opinions about different entities affect the target differently. Examples of possible targets are other companies and society. However, typical multitarget solutions are resource-intensive due to the need to deploy multiple classification models for each target. An alternative to this is the use of multiobjective training approaches, where a single model is capable of handling multiple targets. In this work, we propose the Spanish MTSACorpus 2023, a novel corpus for multitarget sentiment analysis in finance, and we evaluate its reliability with several large language models for multiobjective training. To this end, we compare three design approaches: (i) a Main Economic Target (MET) detection model based on token classification plus a multiclass classification model for sentiment analysis for each target; (ii) a MET detection model based on token classification but replacing the sentiment analysis models with a multilabel classification model; and (iii) using seq2seq-type models, such as mBART and mT5, to return a response sequence containing the MET and the sentiments of different targets. Based on the computational resources required and the performance obtained, we consider the fine-tuned mBART to be the best approach, with a mean F1 of 80.300%.

1. Introduction

Analyzing sentiment in the context of business and finance presents a unique set of challenges. Financial documents can contain complex language, jargon, or numerical data that can confound sentiment analysis (SA) models. The sensitivity of markets to rapidly changing events and their susceptibility to small linguistic changes make it difficult to accurately predict market reactions. Another challenge for SA in finance is that there are targets other than the primary target that may perceive sentiment differently than the primary target. Examples of these targets include entire industries, consumers, or the market as a whole. Moreover, a change in sentiment towards one target can have a domino effect on other related targets due to the interconnectedness of different targets. For example, a positive earnings report from a technology company can not only affect sentiment toward that company’s stock but also have a broader impact on the technology sector and even the entire stock market. Similarly, changes in economic policy or geopolitical events can simultaneously affect sentiment toward currencies, commodities, and international markets.
Multitarget sentiment analysis (MTSA) enhances the analysis of text data by providing a more granular understanding of sentiment. Unlike plain SA, MTSA examines sentiment toward specific targets, such as investors, competitors, or society, within the same document. This granularity allows for a more nuanced and meaningful analysis. These insights are not only more informative but also more actionable. For example, decision makers can identify which specific targets are driving sentiment, enabling more precise strategies and responses. Other applications include product reviews, social media monitoring, or customer support.
One way to use MTSA is to train individual classification models for each target. Because each model is trained to analyze sentiment specifically for its designated target, there are several advantages: (i) specificity, as individual models allow for fine-grained SA and potentially obtaining more accurate results; (ii) scalability, as it is easier to scale model analysis across many targets when models are trained individually; and (iii) the isolation of training data, as models can be trained on target-specific datasets, ensuring that the training data are relevant to the sentiment of that particular target. However, the resulting models have two major drawbacks: (i) isolation, as models are not able to learn jointly from the domain, and (ii) time, memory, and inference resources grow in proportion to the number of targets, making these models infeasible in certain real-world situations.
In this work, we evaluate different MTSA strategies for the financial domain. Specifically, we collect and annotate a novel dataset of Spanish tweets about the economy, expressing the Main Economic Target (MET) and sentiments about the MET, the rest of the companies, and society. Next, we evaluate two strategies to reduce the requirements for training reliable MTSA models. The first strategy is multilabel training to reduce the need for separate sentiment classifiers for each target. The second strategy is a novel approach based on multitask learning, where all sentiments and the MET are extracted by using generative LLMs, such as BLOOM and BART.
The remainder of the manuscript is organized as follows. Section 2 reviews recent research on MTSA approaches and a general SA analysis on the financial domain. Section 3 describes the Spanish MTSACorpus 2023 dataset and the pipeline of our benchmark to compare the proposed strategies with the baselines. In Section 4, the results of the baselines, the multilabel strategy, and the generative approach are presented, and they are discussed in Section 5. Finally, the conclusions and suggestions for further work are given in Section 6.

2. Background Information

In this section, we examine novel approaches and strategies for conducting MTSA (see Section 2.1) and background information on SA applied to finances and economics in Spanish and other languages (see Section 2.2).

2.1. Multitarget Sentiment Analysis

SA focuses on extracting the subjective opinions expressed in a document, usually expressed as a binary classification (‘positive’ or ‘negative’), although other classifications add more degrees of subjectivity.
According to the specificity of SA, three levels can be distinguished depending on whether the document is analyzed as a whole (document-based), sentence-by-sentence (sentence-based), or aspect-based (ABSA). Of these approaches, ABSA is the only one that considers that a piece of text can express more than one sentiment. Specifically, a sentiment is identified for each subtopic. A detailed survey on ABSA can be found in [1], where the authors discover that deep learning architectures based on transformers are the most popular way to build ABSA systems. It is worth noting that depending on the domain, it is relevant to develop custom datasets for specific domains.
A novel strategy for performing SA that is attracting some academic attention is known as MTSA. Unlike traditional SA, which classifies the overall sentiment, or ABSA, which focuses on specific aspects within a single target, MTSA expands its scope by considering that there is more than one target that could be interested in the sentiment. Therefore, MTSA allows for the simultaneous analysis of sentiment toward multiple targets, such as the industry, company, or consumer, all within the same text. Similar to other automatic document classification tasks, recent advances in natural language processing (NLP) have led to the development of deep learning models, including transformer-based architectures, designed to capture complex relationships between different targets and their associated sentiments.
In Spanish, the work described in [2] describes an MTSA strategy composed of several individual models that are able to detect the MET and obtain its polarity and the sentiments toward other companies and citizens. This work also proposes a novel corpus composed of two data sources: tweets about finance and headlines from digital newspapers, both in Spanish. This dataset was evaluated by using automatic document classification models based on different LLMs, where the best results were obtained with LLMs trained on Spanish datasets (MarIA and BETO). In this work, a multilabel classification approach was also evaluated, and similar results were obtained. However, this approach does not take into account the MET extraction. It is worth mentioning that the [2] dataset has been extended and used as a common task in IberLEF 2023 [3].

2.2. Sentiment Analysis toward Finances and Economics

In the bibliography, it is possible to find several papers on SA applied in the financial domain. In [4], the authors used a dataset of about 1 million messages from StockTwits to test the reliability of different features and machine learning algorithms used in financial SA. The authors found that statistical features such as bigrams or emoticons significantly improved the performance of the models. However, the authors also found that other traditional machine learning algorithms did not improve performance. The authors also provide empirical evidence that there is a correlation between the preprocessing method and the size of the dataset with two variables: investor sentiment and stock returns. In [5], the authors developed a platform to evaluate several SA methods based on the integration of text features with deep and machine learning classifiers on several financial and economic datasets. Their results showed that transformers achieved state-of-the-art SA results. It is worth noting that lightweight distilled versions of transformers achieved similar results to their larger counterparts. In [6], a method called FiGAS (Fine-Grained Aspect-based Sentiment Analysis) was presented, which identifies the sentiment associated with specific topics in each sentence and assigns these topics polarity scores with real values between −1 and +1. It is an unsupervised approach for economic and financial domains, using a specialized lexicon provided by the authors. Some financial datasets that have been used in various studies such as [7,8] for the evaluation and development of sentiment analysis systems are:
  • TRC2-Financial. This is a subset of the TRC24 (https://trec.nist.gov/data/reuters/reuters.html accessed on 7 February 2024) dataset from Reuters, consisting of 1.8 million articles between 2008 and 2010. In this case, the dataset used in [7] contains 46,143 documents with more than 29 million words and almost 400,000 sentences, which have been filtered for some key financial terms to make the corpus more relevant.
  • Financial PhraseBank. This is a dataset consisting of 4845 English sentences randomly selected from financial news available in the LexisNexis database [9]. These sentences were then annotated by 16 people with experience in finance and economics. The annotators were asked to assign labels based on how they thought the information in the sentence might affect the stock price of the company mentioned. The dataset also provides information on the level of agreement between the annotators [7].
  • FiQA. This is a dataset consisting of pairs of questions and answers about financial reports written by financial experts with annotated gold reasoning programs to ensure a complete explanation [9].
In Spanish, the work is scarce. For example, in [10], the authors explored different strategies for combining feature sets to improve the performance of applying SA to financial texts. In this paper, the authors published a dataset of nearly 16k tweets annotated with three levels of sentiment. The combination of linguistic features and embedding-based features achieved the best result in their experiments (a weighted F1 score of 73.159%).

3. Materials and Methods

In this section, we present a summary of the material, methods, and proposed approach. Figure 1 shows the pipeline of this work. In summary, the process is as follows. First, we collected tweets from Spanish economists and digital newspapers by using the UMUCorpusClassifier tool [11]. Second, the corpus is preprocessed to normalize the text format. Third, the corpus is divided into training, validation, and test subsets in a 60-20-20 ratio. Fourth, to analyze sentiment for different targets in financial tweets, we experimented with three approaches: (i) fine-tuning pretrained models based on transformers for SA and MET entity detection, i.e., a sentiment classification model for each target and a MET detection model; (ii) a MET detection model and a multilabel classification model for different targets; and (iii) fine-tuning an encoder–decoder model for target extraction and SA for three targets. For each approach, we evaluated different pretrained models, both monolingual and multilingual. Finally, we performed an evaluation to select the best approach.

3.1. Spanish MTSACorpus 2023

The Spanish MTSACorpus 2023 is compiled from X (formerly Twitter) by using the UMUCorpusClassifier tool [11]. Twitter is a valuable data source for SA because of several advantages. First, this social network provides a large and continuously updated stream of text data, offering a wide range of topics and real-time insights. Second, the brevity of microblogging encourages concise and focused language, making it suitable for text classification tasks. In addition, tweets often reflect the latest trends and public opinion, making them relevant for monitoring and analyzing current events and sentiment related to finance. Finally, hashtags and keywords facilitate the compilation process.
The Spanish MTSACorpus 2023 is compiled from the timelines of Spanish economists and digital newspapers focused on finance, following an approach similar to [10]. Some of the users are (i) el Economista (https://www.eleconomista.es/ accessed on 7 February 2024), (ii) Expansión (https://www.expansion.com accessed on 7 February 2024), and (iii) Cinco Días (https://cincodias.elpais.com/ accessed on 7 February 2024), to name a few. Figure 2 contains an example of a tweet from the dataset, stating that employment in the euro area and the EU has reached a new record, despite the economic slowdown.
The annotation process is performed by a group of 4 annotators who annotate the sentiment toward the three targets and extract the MET. The annotation rules of the MET were that the MET should appear literally in the text, the MET should be as short as possible while preserving the semantics, and starting pronouns should be absent.
In terms of sentiment, these were defined on a three-point scale: positive, neutral, and negative, with positive reflecting favorable opinions, negative indicating unfavorable sentiments, and neutral indicating the absence of a strong opinion. In terms of targets, the annotators consider society as the general public and companies as other companies or businesses apart from the MET, but not limited to the same sector.
Disagreements among the annotators were addressed in daily meetings, where annotations that deviated from the annotation guidelines were corrected. Cases where consensus could not be reached were removed from the final dataset.
The resulting dataset contains 7020 instances, which are divided into training, validation, and test sets in a ratio of 60-20-20. Table 1 shows the statistics of the Spanish MTSACorpus 2023 for the three evaluated targets. Note that the training and validation sets are stratified to avoid introducing bias by splitting the dataset.
To analyze the dataset, we observe the information gain for each linguistic feature obtained with the UMUTextStats tool [12]. Figure 3a–c show the top ten normalized linguistic features for the MET, companies, and society, respectively. It can be observed that, regardless of the target, the presence of negative words is correlated with negative emotions, especially lexical items related to sadness and, to a lesser extent, to anger. However, in the case of documents labeled as positive, only positive words are strongly correlated with MET, but not with companies or society. In fact, we observed that the feature most correlated with positive documents is the percentage symbol for companies and the country lexicon for society. There are other features that are relevant for two of the targets, as it is the case of determiners, since we observed a correlation when the targets are other companies or society. Finally, we also observe that suffixes with an appropriate connotation are relevant for the MET.
The dataset is available at the following link: https://github.com/NLP-UMUTeam/Spanish-MTSA-2023 (accessed on 7 February 2024). The dataset includes the MET, the sentiment of the three targets, and the split, but in order to respect Twitter’s policy and users’ right to privacy, we only share the Twitter IDs, not the text.

3.2. Approaches

In this section, we explain the different methods and approaches used to create a MTSA system. The MTSA models will output the MET, which is the main target of the text, and the sentiments toward three targets: the MET, other companies, and society. Thus, we have one sequence classification problem, i.e., the extraction of the MET, and three sentiment classification problems, one for each target. In the following sections, we evaluate different strategies for solving all these subtasks.

3.2.1. Strategy A: MET Detection Using Token Classification and Sentiment Classification Using Multiclass Classification

In this approach, we considered the problem of MET detection as a token classification task and SA as a multiclass classification task. In Figure 4, we can see the architecture of the system, and we proceeded to fine-tune different pretrained models by using the examples available in the dataset we collected.
Fine-tuning refers to the process of taking a pretrained model and adjusting it to perform a specific task by using the knowledge acquired in the pretraining phase of the models. Moreover, we can observe that before the fine-tuning process, the dataset was tokenized, which involves converting text strings into integer token IDs that can be read by transformer-based pretrained models through the associated tokenizer.
We considered MET detection as a token classification task, where labels are assigned to each token in a text sequence, similar to Named Entity Recognition (NER) models. However, instead of including multiple entities like other existing NER models, we included only the “target” entity. Therefore, using the IOB2 (Inside, Outside, and Beginning) format to label the tokens, there are a total of three entity classes for each word token: (i) B-TARGET, indicating the beginning of the target entity; (ii) I-TARGET, indicating that the token belongs to the target entity; and (iii) O, indicating that the token does not belong to any target entity. To allow the fine-tuned models to assign labels to each token in an input sequence, we added a token classification layer to the model.
When annotating the datasets, we considered that the sentiment of the entities is represented by different emotional states, such as positive, neutral, and negative. Therefore, the goal of the SA model is to assign one of the three categories to a given sequence, indicating the sentiment it represents. In Figure 4, we can see that the architecture of the SA model is similar to that of MET detection, but in this case, a sequence classification is added, which is responsible for assigning labels to text sequences based on the available categories.
Recent developments in LLM have demonstrated their success in many NLP tasks in different languages. For Spanish, there are several monolingual and multilingual models based on BERT [13], RoBERTa [14], ALBERT [15], DistilBERT [16], and XLM-RoBERTa [17], pretrained on different text sources. Therefore, we evaluated the performance of different pretrained LLMs for the tasks of MET detection and SA.
The following models were evaluated: (i) BETO [18], an LLM model based on BERT and trained with the Spanish Unannotated Corpora; (ii) MarIA [19], based on RoBERTa and pretrained with web crawlings from the National Library of Spain; (iii) BERTIN [20], which is another model based on RoBERTa but trained with the Spanish part of the mC4 dataset; (iv) ALBETO [21], a version of ALBERT, which is a lightweight version of BERT, pretrained only with documents written in Spanish; (v) DistilBETO (DBETO) [21], a version of DistilBERT (another lightweight version of BERT), trained by using distillation techniques to transfer the weights of BETO to a new model with fewer layers and less complexity; (vi) RoBERTuito [22], a pretrained language model for user-generated content in Spanish, trained following RoBERTa guidelines on 500 million tweets; and (vii) XLM-RoBERTa [17], a multilingual version of RoBERTa, trained with data filtered from CommonCrawl from 100 different languages.
The hyperparameters used to fine-tune the pretrained models for MET detection are a 16-train batch size, 10 epochs, 0.01 weight decay, and a learning rate of 2 × 10−5. For sentiment detection, we perform hyperparameter tuning of the LLMs, evaluating the learning rate, number of training epochs, batch size, warmup steps, and weight decay. This experiment is performed by using the RayTune tool [23]. The number of models evaluated is 10 for each LLM and target. The parameters are selected by using Distributed Asynchronous Hyperparameter Optimization (HyperOptSearch) with the Tree of Parzen Estimators (TPE) [24] and the ASHA scheduler. Table 2 shows the best configuration for each LLM and target. Note that the best results are different for each goal. For example, the number of training epochs is larger when the target is society and the warm-up steps are smaller when the target is other companies.

3.2.2. Strategy B: MET Detection Using Token Classification and Sentiment Classification Using Multilabel Classification

In the first approach (see Section 3.2.1), we created an SA model for each target. This results in a higher computational cost and longer inference time. To address this issue, in this approach, we kept MET detection as a token classification model and approached the multitarget SA problem from a multilabel perspective. In a multilabel classification problem, each text or tweet can be associated with multiple labels, which means that an instance can belong to more than one category at the same time as they are not mutually exclusive.
To achieve multilabel classification, we transformed the labels of the dataset through the “one-hot” representation, considering the sentiment of a tweet as a set of emotional states associated with each entity. Thus, the sentiment set is “target_pos”, “target_neu”, “target_neg”, “other_pos”, “other_neu”, “other_neg”, “society_pos”, “society_neu”, and “society_neg”, where each label represents the emotional state of a specific target. In this case, labels with the prefix ’other’ indicate the sentiments of companies, labels with the prefix “society” refer to the consumer entity, and labels with the prefix “target” refer to the sentiments of the MET.
Table 3 shows the output of the multilabel model with a set of three simple examples. In this approach, the same pretrained models were evaluated as in the previous approach (see Section 3.2.1). For multilabel classification, we performed an epoch-based hyperparameter search. That is, we keep variables such as weight decay and learning rate fixed while varying the value of epochs. This allows the model to perform evaluation and adjustment after each epoch, thus selecting the best model. The hyperparameters used to fine-tune pretrained models for multilabel classification are a training batch size of 8, 6 epochs, a weight decay of 0.01, and a learning rate of 2 × 10−5.

3.2.3. Strategy C: Multitarget Sentiment Classification Model

An inconvenience of the previous approaches (see Section 3.2.1 and Section 3.2.2) is that at least two models are required, one for MET detection and another for SA, which still implies a high computational burden. To address this issue, we propose an MTSA approach that involves building a sequence-to-sequence (Seq2Seq) model capable of simultaneously providing MET and sentiment for different targets given a single input.
Seq2Seq models, also known as encoder–decoder models, use both parts of the transformers architecture, i.e., they use both the encoder and the decoder of a transformers network. In this case, the attention layers of the encoder can access all words in the initial sentence, while the attention layers of the decoder can only access words that are positioned before a specific word in the input. These models are commonly used for natural language processing tasks that involve understanding input sequences and generating output sequences, often of different lengths and structures. Common use cases for encoder–decoder models include text translation and summarization. In addition, these models have shown flexibility in their responses because they do not rely on a sequence classification layer, as encoder models do for classification, but instead rely on the decoding mechanism to allow for specific output definition.
We based our approach on the REBEL framework [25], which uses mBART to extract entities and relationships from text. In this case, we used different seq2seq models such as mBART [26] and mT5 [27] as a base model and fine-tuned them to make the decoder generate the appropriate response, which in this case is the MET of the text and the sentiments of the respective targets.
On the one hand, mBART is an encoder–decoder model pretrained on a large monolingual corpus in multiple languages by using the BART goal. mBART is one of the first methods to pretrain a full sequence-to-sequence model by denoising complete texts in multiple languages, while other models focus only on the encoder, decoder, or reconstruction of parts of the text. The mT5 model, on the other hand, is based on the T5 (Text-to-Text Transfer Transformer) and has been pretrained on a new CommonCrawl-based dataset covering 101 languages.
In order to extract the MET and the sentiments for each target within a single sequence, we expressed the output of the model as a sequence of tokens so that we could easily retrieve each feature and minimize the number of tokens to generate, making the decoding process more efficient. To accomplish this, we introduced a number of new tokens into the seq2seq model that act as markers. The token <absa> marks the beginning of MET. The token <target> marks the end of MET and the start of the sentiment for the MET entity. The token <companies> marks the end of the sentiment for the MET entity and the beginning of the sentiment for the companies’ entity. On the other hand, the token <consumers> marks the end of the sentiment for the companies entity and the beginning of the sentiment for the society entity.
The hyperparameters used to fine-tune the seq2seq models of both mBART and mT5 are as follows: a train batch size of 4, a weight decay of 0.01, a learning rate of 2 × 10−5, 6 epochs for mBART, and 24 epochs for mT5. An epoch-based evaluation approach was used, where the model is evaluated and adjusted after each epoch to select the best model from a given epoch.
Figure 5 shows three examples of MTSA with the mBART model. We can observe that the model is able to identify METs and assign corresponding sentiments to each target through an input text.

4. Results

In this section, we explore the capabilities and limitations of the different deep learning approaches that we evaluated for MTSA. To perform this, we used the standard metrics in NLP tasks, such as precision (P), recall (R), and the macroaverage F1 score. Since MTSA involves two goals, target detection and sentiment identification for different entities, we averaged the F1 scores (Mean F1) of the MET detection and sentiment for each target, giving them equal weight. This allows us to compare the overall performance of the different approaches.

4.1. Strategy A: MET Detection Using Token Classification and Sentiment Classification Using Multiclass Classification

The results of the evaluation of different pretrained models for MET detection and the multiclass sentiment classification of different entities are shown in Table 4. Regarding the experiments conducted for MET detection, BERT-based models such as BETO, ALBETO, and DBETO achieved better results than models based on RoBERTa. In terms of the macro F1 score, BETO achieved the best result with 83.584%, and lightweight models such as ALBETO and DBETO obtained competent results with 80.719% and 80.176%, respectively, surpassing other more complex models trained on a larger corpus such as MarIA and BERTIN. Furthermore, we can observe that monolingual models for Spanish generally perform better than multilingual models such as XLM-RoBERTa, with the exception of RoBERTuito, which was pretrained on a less extensive corpus.
In this approach, we considered the SA problem for different entities as a multiclass problem, so we evaluated the fine-tuning of different pretrained models for each of the targets. In terms of performance, the MarIA model achieved the best results in sentiment classification for the three entities, with a macro F1 score of 73.837%, 80.357%, and 78.965%.
If we compare the overall performance, i.e., the Mean F1 score metric, we can see that BETO has the best result with 79.990%, although the SA models based on BETO have worse results than MarIA thanks to its performance in MET detection. MarIA has the second-best result with 77.940%, surpassing the multilingual model (XLM-RoBERTa). We can also observe that the lightweight models (ALBETO and DBETO) achieve competent results with 77.002% and 76.635%, respectively, outperforming more complex models such as BERTIN and RoBERTuito.
Table 5 and Table 6 show a classification report for MET detection and SA using the model that achieved the best Mean F1 (BETO). In Table 5, we can see that BETO achieved an accuracy of 82.92% and a recall of 84.25% for MET detection. On the other hand, MarIA obtained the best F1 for calculating the sentiment of each target. It can be seen that all models, including the lightweight models ALBETO and DBETO, are very competitive. The only exception is RoBERTuito, which obtained limited results. The main reason for this can be related to the fact that this LLM is trained on general Spanish tweets. However, the knowledge of a complex domain such as finance may make it difficult to rely on short tweets, which is why the model underperformed in the MET detection task with an F1 score of 59.89%.
Table 6 shows the classification report for the BETO model to extract the sentiment polarity of each target. It can be seen that the precision, recall, and F1 scores are similar. The most limited result is obtained with the neutral class for the MET with precision, recall, and F1 score results of 50.230%, 47.391%, and 48.770%, respectively. Looking at the confusion matrix of this model, we notice that the false predictions of the neutral class were made with the same proportion of positive and negative examples, highlighting the difficulty of determining which texts have a neutral polarity toward the MET.

4.2. Strategy B: MET Detection Using Token Classification and Sentiment Classification Using Multilabel Classification

This approach for the MTSA system considers the identification of METs as a token classification problem, just like the previous approach (see Section 3.2.1), and the SA of different targets as a multilabel problem. Therefore, there is no need for a separate analysis model for each entity. A multilabel model that includes sentiments for all entities is sufficient since the classes are not mutually exclusive. In Table 7, we can see that the scores obtained from different pretrained models for MET detection in the first approach (see Section 3.2.1) were maintained, and the same models were evaluated for multilabel sentiment classification.
In terms of the macro F1 score, MarIA achieved the best results in multilabel classification with 78.510%. Lighter models such as ALBETO and DBETO achieved lower results with 73.194% and 72.737%, respectively. In this case, the XLM-RoBERTa model performed better than the light models (74.090% in macro F1 score) but lower than the RoBERTuito model, which is a Spanish model trained on a much smaller corpus.
In terms of overall performance, BETO achieved the best result with 80.023%, although it is not the best model for SA due to its performance in MET detection. The classification report for MET detection and sentiment analysis is shown in Table 5 and Table 8. In Table 5, we can see that it achieved a precision of 82.925% and a recall of 84.254% for MET detection. Table 8 shows the precision, recall, and F1 score obtained for each sentiment polarity. We can see that the model obtained an F1 score greater than 72% in all the polarities of different targets, except for the neutral polarity of MET, where it obtained an F1 score of 48.107%.

4.3. Strategy C: Multitarget Sentiment Classification Model

Unlike other approaches that require at least two models for MTSA, this approach is based on using a seq2seq model as a base and adapting it to generate MET and the sentiments of different targets as a response sequence.
In this paper, we evaluated mBART and mT5. Table 9 shows the overall performance and the results obtained in each task of the two models. Regarding MET detection, mBART achieved the best result with a macro F1 score of 85.053%. It also achieved the best results in target and company SA with 71.936% and 78.368% macro F1 scores, respectively. The mT5 model achieved the best result in consumer SA with a macro F1 score of 76.875%. According to the Mean F1 score metric, the mBART model achieved the best overall performance with 80.3%.
Table 10 shows the MET detection classification report for the mBART model. It can be observed that it achieved a precision of 85.455% and a recall of 84.654%, surpassing the best MET detection model from the two previous approaches, such as BETO (see Table 5). Regarding the SA with the mBART model, in Table 11, we can see the precision, recall, and F1 score obtained in different sentiment polarities. It is noticeable that the model obtained an F1 score greater than 72% in all polarities of different targets, except for the neutral polarity of MET, where it obtained an F1 score of 50.673%.

5. Discussion

This section compares the best model of each approach and explains some limitations and which of them is more suitable for a multitarget analysis of targeted sentiment analysis.
Table 12 presents a comparison between the best model of each approach. This table provides a summary of the Mean F1, which refers to the overall performance; the F1-MET, which indicates the macro F1 score of the MET detection model; and the F1-Sentiment, which indicates the average F1 score of the SA models for the targets. It can be observed that the approach using a seq2seq model for MTSA gave the best result with an average F1 score of 80.30%, followed by the approach using a MET detection model and a multilabel sentiment classification model for entities, which achieved an average F1 score of 80.022%.
In terms of the MET identification performance, mBART achieved the best result with a macro F1 score of 85.053%, approximately 1.5% better than the BETO-based model. In terms of SA, the multitarget approach produced the worst result with a macro F1 score of 75.548%, while the multilabel classification approach based on BETO produced the best result with a macro F1 score of 76.461%.
Although mBART is the heaviest model, it requires less inference time because it does not have to load multiple models. However, because it is a seq2seq model, it is sometimes unable to extract the sentiment of certain targets, leaving the sentiment null. For example, in the test set, the model was unable to identify sentiments for 12 instances of certain targets out of a total of 1404 instances, resulting in an error rate of 0.85%. To address this issue, we replaced null sentiments with the most common sentiment from the training set. It is possible that this caused the model to perform about 1% worse than the BETO-based fine-tuned SA model. Therefore, we performed an error analysis to identify the weaknesses of the system that we used to identify the sentiment of different entities by using the mBART model. For each entity, we examined instances in the test set that were misclassified. The confusion matrices for this system for each entity are shown in Figure 6a–c.
Using the confusion matrix, we can see that the model is not biased toward making serious mistakes, such as identifying a positive sentiment as a negative sentiment and vice versa. We can see that the model mislabels 9.36% of the negative and 8.27% of the positive texts for the entity target (see Figure 6a) and confuses neutral texts with positive ones at 27.39%. Regarding the prediction of sentiments for the companies entity, the model predicts 18.28% of the negative texts and 17.25% of the neutral texts. Regarding the prediction of the sentiments for the society target, we can see that it behaves similarly to the prediction of the companies sentiments, mistaking 17.59% of the negative texts and 20.12% of the positive texts as neutral.

6. Conclusions

MTSA is an approach that enriches the analysis of textual data by providing a more granular understanding of sentiment as it extracts the subjective polarity of text from different multiple targets simultaneously in a given context. This approach is useful in the financial field because it allows for extracting opinions about different entities that affect the actor differently, including other companies and society. To this end, we evaluated three system design approaches: (i) a MET detection model based on token classification plus multiclass classification models for the SA of different entities; (ii) a MET detection model based on token classification plus a multilabel classification model for SA; and (iii) using seq2seq type models (mBART and mT5) as a base model and fine-tuning it to return a response sequence that includes the MET and the sentiments of different targets.
The multiclass and multilabel classification approaches are based on the tuning and evaluation of different pretrained models, both monolingual and multilingual and basic and lightweight. The BETO model is the one that obtained the best overall performance in the first two proposed architectures, with a Mean F1 of 79.90% and 80.02%, respectively. The third proposed architecture, based on fine-tuning an mBART model, obtained the best performance with an 80.301% Mean F1 and 85.053% MET detection, outperforming the other approaches. Based on the computational resources required by each approach and the performance obtained, we considered that fine-tuned mBART is the best approach to address the MTSA problem since it has only one model and has better scalability to more targets.
As a promising future line of research, we propose to focus on the interpretability of the results. In this sense, a limitation of our pipeline is that we only focus on the interpretability of the linguistic features applied to the targets, but this approach is model agnostic, and it does not consider the performance of the evaluated models. In this sense, we propose to evaluate other techniques such as LIME or SHAP to gain insight into the predictions obtained. Furthermore, we want to evaluate the effects of figurative language in economics and how it affects the polarity of the sentiment. To perform this, we will use some available datasets [28] to find clues in the texts that could determine the presence of satire, irony, or parody. In addition, we will investigate the effects of offensive content in finance and economics [29], as hostile environments are expected to be negatively correlated with SA in finance. Another improvement is the inclusion of cross-validation in our pipeline to avoid bias in the selection of the best model with a fixed validation split. The sentiment analysis model, trained on data from various official accounts, has no limitations in its application to text or news from different platforms. The uniformity of text features, syntax, and writing style across platforms supports its general use. However, it is important to note that the model was primarily trained on short texts from microblogging platforms such as X (formerly Twitter) and news headlines. Therefore, an interesting possibility for further evaluation is to test the model on text extracted from long articles and financial reports. This suggestion is identified as a potential area for future work.
The source code of the project is available in https://github.com/NLP-UMUTeam/Spanish-MTSA-2023.

Author Contributions

Conceptualization, J.A.G.-D. and R.V.-G.; data curation, R.P.; funding acquisition, R.V.-G.; investigation, R.P.; project administration, R.V.-G.; resources, R.V.-G.; software, J.A.G.-D. and R.P.; supervision, R.V.-G.; visualization, J.A.G.-D.; writing—original draft, all. All authors have read and agreed to the published version of the manuscript.

Funding

This work is part of the research project AIInFunds (PDC2021-121112-I00) funded by MCIN/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/PRTR.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Brauwers, G.; Frasincar, F. A Survey on Aspect-Based Sentiment Classification. ACM Comput. Surv. 2023, 55, 65:1–65:37. [Google Scholar] [CrossRef]
  2. Pan, R.; García-Díaz, J.A.; Garcia-Sanchez, F.; Valencia-García, R. Evaluation of transformer models for financial targeted sentiment analysis in Spanish. PeerJ Comput. Sci. 2023, 9, e1377. [Google Scholar] [CrossRef]
  3. Garcia-Díaz, J.A.; Almela, Á.; García-Sánchez, F.; Alcaraz-Mármol, G.; Marín, M.J.; Valencia-García, R. Overview of FinancES 2023: Financial Targeted Sentiment Analysis in Spanish. Proces. Del Leng. Nat. 2023, 71, 417–423. [Google Scholar]
  4. Renault, T. Sentiment analysis and machine learning in finance: A comparison of methods and models on one million messages. Digit. Financ. 2020, 2, 1–13. [Google Scholar] [CrossRef]
  5. Mishev, K.; Gjorgjevikj, A.; Vodenska, I.; Chitkushev, L.T.; Trajanov, D. Evaluation of sentiment analysis in finance: From lexicons to transformers. IEEE Access 2020, 8, 131662–131682. [Google Scholar] [CrossRef]
  6. Consoli, S.; Barbaglia, L.; Manzan, S. Fine-grained, aspect-based sentiment analysis on economic and financial lexicon. Knowl.-Based Syst. 2022, 247, 108781. [Google Scholar] [CrossRef]
  7. Araci, D. Finbert: Financial sentiment analysis with pre-trained language models. arXiv 2019, arXiv:1908.10063. [Google Scholar]
  8. Leippold, M. Sentiment spin: Attacking financial sentiment with GPT-3. Financ. Res. Lett. 2023, 55, 103957. [Google Scholar] [CrossRef]
  9. Malo, P.; Sinha, A.; Korhonen, P.; Wallenius, J.; Takala, P. Good debt or bad debt: Detecting semantic orientations in economic texts. J. Assoc. Inf. Sci. Technol. 2014, 65, 782–796. [Google Scholar] [CrossRef]
  10. García-Díaz, J.A.; García-Sánchez, F.; Valencia-García, R. Smart analysis of economics sentiment in Spanish based on linguistic features and transformers. IEEE Access 2023, 11, 14211–14224. [Google Scholar] [CrossRef]
  11. García-Díaz, J.A.; Almela, Á.; Alcaraz-Mármol, G.; Valencia-García, R. UMUCorpusClassifier: Compilation and evaluation of linguistic corpus for Natural Language Processing tasks. Proces. Del Leng. Nat. 2020, 65, 139–142. [Google Scholar]
  12. García-Díaz, J.A.; Vivancos-Vicente, P.J.; Almela, A.; Valencia-García, R. Umutextstats: A linguistic feature extraction tool for spanish. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille, France, 20–25 June 2022; pp. 6035–6044. [Google Scholar]
  13. Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
  14. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
  15. Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
  16. Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
  17. Conneau, A.; Khandelwal, K.; Goyal, N.; Chaudhary, V.; Wenzek, G.; Guzmán, F.; Grave, E.; Ott, M.; Zettlemoyer, L.; Stoyanov, V. Unsupervised Cross-lingual Representation Learning at Scale. arXiv 2019, arXiv:1911.02116. [Google Scholar]
  18. Cañete, J.; Chaperon, G.; Fuentes, R.; Ho, J.H.; Kang, H.; Pérez, J. Spanish Pre-Trained BERT Model and Evaluation Data. In Proceedings of the PML4DC at ICLR 2020, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
  19. Fandiño, A.G.; Estapé, J.A.; Pàmies, M.; Palao, J.L.; Ocampo, J.S.; Carrino, C.P.; Oller, C.A.; Penagos, C.R.; Agirre, A.G.; Villegas, M. MarIA: Spanish Language Models. Proces. Del Leng. Nat. 2022, 68, 39–60. [Google Scholar] [CrossRef]
  20. De la Rosa, J.; Ponferrada, E.G.; Villegas, P.; de Prado Salas, P.G.; Romero, M.; Grandury, M. BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling. Proces. Del Leng. Nat. 2022, 68, 13–23. [Google Scholar]
  21. Cañete, J.; Donoso, S.; Bravo-Marquez, F.; Carvallo, A.; Araujo, V. Albeto and distilbeto: Lightweight spanish language models. arXiv 2022, arXiv:2204.09145. [Google Scholar]
  22. Pérez, J.M.; Furman, D.A.; Alonso Alemany, L.; Luque, F.M. RoBERTuito: A pre-trained language model for social media text in Spanish. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille, France, 20–25 June 2022; pp. 7235–7243. [Google Scholar]
  23. Liaw, R.; Liang, E.; Nishihara, R.; Moritz, P.; Gonzalez, J.E.; Stoica, I. Tune: A research platform for distributed model selection and training. arXiv 2018, arXiv:1807.05118. [Google Scholar]
  24. Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for hyper-parameter optimization. In Proceedings of the Advances in Neural Information Processing Systems 24 (NIPS 2011), Granada, Spain, 12–15 December 2011. [Google Scholar]
  25. Huguet Cabot, P.L.; Navigli, R. REBEL: Relation Extraction by End-to-end Language generation. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 2370–2381. [Google Scholar]
  26. Tang, Y.; Tran, C.; Li, X.; Chen, P.J.; Goyal, N.; Chaudhary, V.; Gu, J.; Fan, A. Multilingual Translation with Extensible Multilingual Pretraining and Finetuning. arXiv 2020, arXiv:2008.00401. [Google Scholar]
  27. Xue, L.; Constant, N.; Roberts, A.; Kale, M.; Al-Rfou, R.; Siddhant, A.; Barua, A.; Raffel, C. mT5: A massively multilingual pre-trained text-to-text transformer. arXiv 2020, arXiv:2010.11934. [Google Scholar]
  28. García-Díaz, J.A.; Valencia-García, R. Compilation and evaluation of the spanish saticorpus 2021 for satire identification using linguistic features and transformers. Complex Intell. Syst. 2022, 8, 1723–1736. [Google Scholar] [CrossRef]
  29. García-Díaz, J.A.; Jiménez-Zafra, S.M.; García-Cumbreras, M.A.; Valencia-García, R. Evaluating feature combination strategies for hate-speech detection in spanish using linguistic features and transformers. Complex Intell. Syst. 2023, 9, 2893–2914. [Google Scholar] [CrossRef]
Figure 1. Overall architecture of the system.
Figure 1. Overall architecture of the system.
Electronics 13 00717 g001
Figure 2. Example of a tweet of the Spanish MTSA 2023 corpus. The English text of this tweet is Employment reaches a new record high in the eurozone and the EU, despite the economic slowdown.
Figure 2. Example of a tweet of the Spanish MTSA 2023 corpus. The English text of this tweet is Employment reaches a new record high in the eurozone and the EU, despite the economic slowdown.
Electronics 13 00717 g002
Figure 3. Information gain concerning the different targets.
Figure 3. Information gain concerning the different targets.
Electronics 13 00717 g003
Figure 4. Architecture of the MET detection and sentiment classification system.
Figure 4. Architecture of the MET detection and sentiment classification system.
Electronics 13 00717 g004
Figure 5. Illustration of the mBART model with three simplification examples. The first text is about the year-on-year growth of Chinese tourists in Spain in July. The second text shows that PepsiCo will invest 31 million euros to build a new gazpacho factory. The third text is that Mercedes is preparing to attack Tesla by investing 1 billion euros in the electric car market.
Figure 5. Illustration of the mBART model with three simplification examples. The first text is about the year-on-year growth of Chinese tourists in Spain in July. The second text shows that PepsiCo will invest 31 million euros to build a new gazpacho factory. The third text is that Mercedes is preparing to attack Tesla by investing 1 billion euros in the electric car market.
Electronics 13 00717 g005
Figure 6. Confusion matrices using mBART in test split.
Figure 6. Confusion matrices using mBART in test split.
Electronics 13 00717 g006
Table 1. Spanish MTSACorpus 2023 statistics per target.
Table 1. Spanish MTSACorpus 2023 statistics per target.
LabelTrainValTestTotal
METPositive18486586293135
Neutral6922142301136
Negative16725325452749
CompaniesPositive9543293421625
Neutral20907026793471
Negative11683733831924
SocietyPositive9483343381620
Neutral19776536513281
Negative12874174152119
Total4212140414047020
Table 2. Results of the hyperparameter tuning for the multiclass classification model of sentiments. The parameters are the learning rate (LR, units: ×10−5), the number of training epochs (E, units: integer), the batch size (B, units: integer), the warm-up steps (WS, units of 1k), and the weight decay (WD, units: float).
Table 2. Results of the hyperparameter tuning for the multiclass classification model of sentiments. The parameters are the learning rate (LR, units: ×10−5), the number of training epochs (E, units: integer), the batch size (B, units: integer), the warm-up steps (WS, units of 1k), and the weight decay (WD, units: float).
TargetCompaniesSociety
LLMLREBWSWDLREBWSWDLREBWSWD
ALBETO44160.50.194.2580.250.203.15810.016
BERTIN3.631600.2734160.50.042.95810.028
BETO1.75800.032.7380.250.112.9380.250.15
DBETO4.1380.50.264.93800.0033800.035
MarIA2.63810.051.7380.250.192.9280.250.02
RoBERTuito26160.50.0126160.50.0126160.50.01
XLM24800.094.23160.250.123.25800.13
Table 3. Illustration of the multilabel sentiment classification model with three simplification examples.
Table 3. Illustration of the multilabel sentiment classification model with three simplification examples.
Labels
TextTargetCompaniesSociety
1Turistas chinos en España en Julio 74,367, crecimiento interanual 51%, YTD 309,742, interanual 60% cc @josecdiez @migsebastiang @plalanda_II.
(Chinese tourists in Spain in July 74,367, year-on-year growth 51%, YTD 309,742, year-on-year 60% cc @josecdiez @migsebastiang @plalanda_II.)
POSPOSPOS
2PepsiCo invertirá 31 millones de euros para construir una nueva fábrica de gazpacho Alvalle
(PepsiCo will invest 31 million euros to build a new Alvalle gazpacho factory.)
POSNEUNEU
3Mercedes prepara el ataque a Tesla: invertirá 1.000 millones para asaltar ‘su’ mercado eléctrico…
(Mercedes prepares attack on Tesla: will invest 1 billion to assault ‘its’ electric market…)
NEGNEUNEU
Table 4. Benchmark of the different LLMs and multiclass sentiment classification with test splits evaluated for MTSA. Mean F1 is the average of F1 MET and F1 Sentiment. F1 Sentiment is the average of the sentiment of the MET, companies, and society.
Table 4. Benchmark of the different LLMs and multiclass sentiment classification with test splits evaluated for MTSA. Mean F1 is the average of F1 MET and F1 Sentiment. F1 Sentiment is the average of the sentiment of the MET, companies, and society.
LLMMean F1F1 METF1 Sentiment
ALBETO77.00280.71973.284 (67.936, 76.963, 74.954)
BERTIN75.33076.69373.967 (69.932, 75.602, 76.368)
BETO79.99083.58476.395 (72.048, 78.885, 78.253)
DBETO76.63580.17673.094 (68.075, 76.482, 74.724)
MarIA77.94078.16177.720 (73.837, 80.357, 78.965)
RoBERTuito59.88646.38673.751 (70.376, 75.857, 75.019)
XLM76.24378.97873.507 (69.924, 75.577, 75.020)
Table 5. Precision (P), recall (R), and F1 score (F1) for the MET using model with the best Mean F1 score (BETO).
Table 5. Precision (P), recall (R), and F1 score (F1) for the MET using model with the best Mean F1 score (BETO).
LabelPRF1
Target (MET)82.92584.25483.584
Table 6. Sentiment analysis classification report for each target in Strategy A. The selected model is BETO, which achieved the best average F1 score. For each target, the precision (P), recall (R), and F1 score (F1) are reported.
Table 6. Sentiment analysis classification report for each target in Strategy A. The selected model is BETO, which achieved the best average F1 score. For each target, the precision (P), recall (R), and F1 score (F1) are reported.
TargetCompaniesSociety
LabelPRF1PRF1PRF1
Negative83.93281.46882.68277.62473.36875.43677.67480.48279.053
Neutral50.23047.39148.77081.24184.83182.99781.55878.80280.156
Positive82.82786.64584.69379.27977.19378.22274.78376.33175.549
Table 7. Benchmark of the different LLMs and multilabel sentiment classification with test splits evaluated for MTSA.
Table 7. Benchmark of the different LLMs and multilabel sentiment classification with test splits evaluated for MTSA.
LLMMean F1METMultilabel Sentiment
PRF1PRF1
ALBETO78.33680.08281.36780.71974.67572.07873.194
BETO80.02282.92584.25483.58477.68575.39876.461
BERTIN76.95777.50275.90176.69376.68374.70075.523
DBETO76.10879.72980.62880.17674.38471.39572.737
MARIA76.45778.85477.48178.16179.66877.51078.510
RoBERTuito60.95565.31035.96546.38676.68374.70075.523
XLM72.22178.78978.97870.35175.54272.99474.090
Table 8. Sentiment analysis classification report for each target in Strategy B. The selected model is BETO, which achieved the best average F1 score. For each target, the precision (P), recall (R), and F1 score (F1) are reported.
Table 8. Sentiment analysis classification report for each target in Strategy B. The selected model is BETO, which achieved the best average F1 score. For each target, the precision (P), recall (R), and F1 score (F1) are reported.
TargetCompaniesSociety
LabelPRF1PRF1PRF1
Negative82.10980.00081.04179.14372.32475.58080.97775.90478.358
Neutral49.31546.95748.10780.44186.00983.13279.59283.87181.675
Positive85.09684.42084.75781.50576.02378.66980.98473.07776.827
Table 9. Benchmark of the different seq2seq models with test splits evaluated for MTSA. Task 2 is the average of the sentiment of the MET, companies, and society.
Table 9. Benchmark of the different seq2seq models with test splits evaluated for MTSA. Task 2 is the average of the sentiment of the MET, companies, and society.
LLMMean F1F1 METF1 Sentiment
mBART80.30085.05375.548 (71.936, 78.368, 76.340)
mT578.28982.24674.333 (68.977, 77.146, 76.875)
Table 10. Precision (P), recall (R), and F1 score (F1) for the MET using seq2seq model with the best Mean F1 score (mBART).
Table 10. Precision (P), recall (R), and F1 score (F1) for the MET using seq2seq model with the best Mean F1 score (mBART).
LabelPRF1
Target (MET)85.45584.65485.053
Table 11. Sentiment analysis classification report for each target in Strategy C. The selected model is mBART, which achieved the best Mean F1 score (mBART). For each target, the precision (P), recall (R), and F1 score (F1) are reported.
Table 11. Sentiment analysis classification report for each target in Strategy C. The selected model is mBART, which achieved the best Mean F1 score (mBART). For each target, the precision (P), recall (R), and F1 score (F1) are reported.
TargetCompaniesSociety
LabelPRF1PRF1PRF1
Negative81.76881.46881.61878.17773.89075.97378.31673.97676.084
Neutral52.31549.13050.67381.57184.09482.81479.17382.33580.723
Positive82.48184.57983.51776.31676.31676.31672.53771.89472.214
Table 12. Comparison table of the best model among the three approaches. Mean F1 refers to the overall performance, F1-MET indicates the macro F1 score of the MET detection model, and F1-Sentiment indicates the average F1 score of the SA models for the entities. No. Model indicates the number of models required to address the MTSA problem and Size indicates the size required for the models.
Table 12. Comparison table of the best model among the three approaches. Mean F1 refers to the overall performance, F1-MET indicates the macro F1 score of the MET detection model, and F1-Sentiment indicates the average F1 score of the SA models for the entities. No. Model indicates the number of models required to address the MTSA problem and Size indicates the size required for the models.
ApproachModelMean F1F1-METF1-SentimentNo. ModelSize (GB)
MET+MulticlassBETO79.99083.58476.39541.76
MET+MultilabelBETO80.02383.58476.46120.8
MultitargetmBART80.30085.05375.54812.3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pan, R.; García-Díaz, J.A.; Valencia-García, R. Individual- vs. Multiple-Objective Strategies for Targeted Sentiment Analysis in Finances Using the Spanish MTSA 2023 Corpus. Electronics 2024, 13, 717. https://doi.org/10.3390/electronics13040717

AMA Style

Pan R, García-Díaz JA, Valencia-García R. Individual- vs. Multiple-Objective Strategies for Targeted Sentiment Analysis in Finances Using the Spanish MTSA 2023 Corpus. Electronics. 2024; 13(4):717. https://doi.org/10.3390/electronics13040717

Chicago/Turabian Style

Pan, Ronghao, José Antonio García-Díaz, and Rafael Valencia-García. 2024. "Individual- vs. Multiple-Objective Strategies for Targeted Sentiment Analysis in Finances Using the Spanish MTSA 2023 Corpus" Electronics 13, no. 4: 717. https://doi.org/10.3390/electronics13040717

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop