A Bullet Screen Sentiment Analysis Method That Integrates the Sentiment Lexicon with RoBERTa-CNN

Liu, Yupan; Wang, Shuo; Yu, Shengshi

doi:10.3390/electronics13203984

Open AccessArticle

A Bullet Screen Sentiment Analysis Method That Integrates the Sentiment Lexicon with RoBERTa-CNN

by

Yupan Liu

,

Shuo Wang

^* and

Shengshi Yu

Key Laboratory of Machine Learning and Computational Intelligence, Hebei University, Baoding 071002, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(20), 3984; https://doi.org/10.3390/electronics13203984 (registering DOI)

Submission received: 21 August 2024 / Revised: 3 October 2024 / Accepted: 4 October 2024 / Published: 10 October 2024

(This article belongs to the Special Issue New Advances in Affective Computing)

Download

Browse Figures

Versions Notes

Abstract

:

Bullet screen, a form of online video commentary in emerging social media, is widely used on video websites frequented by young people. It has become a novel means of expressing emotions towards videos. The characteristics, such as varying text lengths and the presence of numerous new words, lead to ambiguous emotional information. To address these characteristics, this paper proposes a Robustly Optimized BERT Pretraining Approach (RoBERTa) + Convolutional Neural Network (CNN) sentiment classification algorithm integrated with a sentiment lexicon. RoBERTa encodes the input text to enhance semantic feature representation, and CNN extracts local features using multiple convolutional kernels of different sizes. Sentiment classification is then performed by a softmax classifier. Meanwhile, we use the sentiment lexicon to calculate the emotion score of the input text and normalize the emotion score. Finally, the classification results of the sentiment lexicon and RoBERTa+CNN are weighted and calculated. The bullet screens are grouped according to their length, and different weights are assigned to the sentiment lexicon based on their length to enhance the features of the model’s sentiment classification. The method combines the sentiment lexicon can be customized for the domain vocabulary and the pre-trained model can deal with the polysemy. Experimental results demonstrate that the proposed method achieves improvements in precision, recall, and F1 score. The experiments in this paper take the Russia–Ukraine war as the research topic, and the experimental methods can be extended to other events. The experiment demonstrates the effectiveness of the model in the sentiment analysis of bullet screen texts and has a positive effect on grasping the current public opinion status of hot events and guiding the direction of public opinion in a timely manner.

Keywords:

social media sentiment analysis; bullet screen comments; sentiment lexicon; RoBERTa; CNN; Russia–Ukraine conflict

1. Introduction

In recent years, with the continuous development of social media, many new forms of Internet text comments have emerged, and bullet screen comments are one of them. Bullet screen comments are mainly popular on video platforms in both China and Japan [1], such as niconico and bilibili, and the user group is mainly young netizens aged between 10 and 30. This kind of comment will slide from right to left in the video window as the video is played. Bullet screen comments have become a new way to express emotions towards videos. Compared with the text comments of traditional social media, bullet screen comments are more real-time, and have generally short texts and contain a large number of self-created words in specific cultural circles. For example, “鹅友” and “乌贼”, respectively, represent netizens who support Russia and Ukraine in the theme discussion of the Russia–Ukraine war; however, the original meanings of these two words are “friend of the goose” and “squid”. Netizens who support Russia use phrases like “Ura” to express their support for Russia. However, netizens who support Ukraine may simultaneously use “Ura” to mock Russia. Most of the bullet screens are very short words like “Ura”. Netizens only use several simple words to express their emotions, and these words are popular for a very short time and may no longer be used after a few months. Such vocabulary poses a challenge to traditional classification methods.

Not only that, since bullet screen comments are mainly popular in China and Japan and have not been applied on a global scale, there is relatively less emotional analysis of bullet screen comments, and the related research mainly focuses on the form and functionality of bullet screen comments. Few people use deep learning methods to conduct emotional analysis on bullet screen comments.

At present, the main methods for emotion analysis of bullet screens are pre-trained models and sentiment lexicons, mainly to analyze the emotional polarity of the bullet screen. The polarity classification usually only involves two or three categories, such as positive, negative, and neutral. The boundaries between each category are relatively clear, and the judgment of emotional intensity is also relatively simple. Pre-trained models are usually trained on a large-scale general text, so the performance in specific fields may not be as good as that of the sentiment lexicon, frequently adjusting such short-term words requires a significant amount of resources even for models with large parameters. Compared to the pre-trained model, the sentiment lexicons can quickly adjust the sentiment scores of words for bullet screen texts at the word level, also it can be customized according to the vocabulary and language habits in specific fields, so as to better adapt to the text in this field. Not only that, the sentiment lexicon can contain some rare but important emotion-significant vocabulary in specific fields. However, the sentiment lexicon cannot solve the problem of polysemy well, while the pre-trained model can understand the semantics and emotional tendencies of the text in different contexts.

For the sentiment analysis task of bullet screen text which contains special vocabulary and is biased towards short text, We have proposed a bullet screen emotion multi-classification method that combines RoBERTa + CNN and the Sentiment lexicon. It can identify more detailed emotional categories rather than just the judgment of emotional polarity, and obtain a more comprehensive perspective on the emotions of bullet screens. When it comes to bullet screens in specific fields, by supplementing the words in specific fields in the sentiment lexicon and allocating a higher weight to the sentiment lexicon, the emotion of words can be recognized more quickly and accurately when there is insufficient context information. In order to verify the classification ability of the proposed model under specific themes, we collected a bullet screen dataset with the theme of “The Russia–Ukraine War”. After experiments, this method in combination with the sentiment lexicon has enhanced the emotional classification of vocabulary in specific fields. At the same time, it can capture the context information of sentences, extract the deep-level features of the text, and judge the semantics in different contexts. It has improved the emotional classification ability of the model for text forms such as bullet screens. When conducting emotion research on different bullet screen themes and encountering words with strong timeliness in specific cultural circles, the emotion scores of the words can be adjusted quickly without consuming a large amount of computing resources. For the analysis of online public opinion, by conducting emotional analysis on the bullet screen text with a stronger real-time nature, it helps to dynamically grasp the trend and direction of the online public opinion situation, and also makes contributions to preventing the risks of online public opinion [2].

Overall, we present a novel approach to sentiment analysis of bullet screen comments, a form of real-time, short-text commentary prevalent on video platforms in China and Japan. By combining the strengths of pre-trained models like RoBERTa and the flexibility of sentiment lexicons, the proposed method achieves more nuanced emotional classification beyond traditional polarity analysis. Furthermore, we collected and organized a bullet screen dataset with the theme of the Russia–Ukraine War to evaluate the effectiveness of the proposed method. Through experimental verification, our method shows a significant improvement in handling the sentiment classification of bullet screens on specific themes.

2. Related Works

In recent years, sentiment analysis has gained significant attention due to its potential to assess public sentiment and influence in various fields, such as social politics and commerce. Bullet screen sentiment analysis belongs to one type of the sentiment analysis tasks of social media texts. The rapid spread of emotions through social media platforms plays a crucial role in shaping public opinion, prompting extensive research in analyzing emotions from texts such as tweets and bullet screens. Researchers have employed a variety of methods, including sentiment lexicons and deep learning models, to enhance the accuracy and relevance of sentiment classification. These studies highlight the advantages and limitations of lexicon-based approaches, particularly in handling domain-specific and evolving vocabularies, while demonstrating the superior feature extraction and contextual understanding capabilities offered by deep learning techniques like CNN and Long Short-Term Memory Networks (LSTM) in emotion classification tasks.

2.1. Sentiment Analysis of Social Media Text

The emotions contained in social media texts reflect the emotional tendencies of netizens in many fields such as social politics. These emotions spread along with the spread of social media and then affect online public opinion. Therefore, researching and analyzing the emotions of social media texts such as bullet screens has great significance for the trend prediction and evaluation of online public opinion. Georgiadou et al. [3] conducted sentiment analysis on Twitter posts to investigate and aggregate public sentiment on the results of Brexit. Rosario Catelli et al. [4] used Nooj to conduct word segmentation and word frequency statistics on tweets, and proposed a method for emotional analysis of tweets about COVID-19 on Twitter using the sentiment lexicon. Cheng Z et al. [5] proposed a bullet text emotion analysis model combining ALBERT and CNN, which analyzes the context information and local features in the text to obtain the emotion polarity, and can solve the problem of polysemy. Xuqiang Z et al. [6] proposed an emotion analysis model of LSTM based on the attention mechanism, which can more effectively combine the dependency relationship between the front and rear bullet screens and mine the keywords in the bullet screens. The research of Hsieh and Zeng [7] focuses on the sentiment analysis of bullet screen comments and is highly innovative. They proposed the ERNIE-BiLSTM method to address the difficulties in the sentiment analysis of bullet screen comments, such as the short length of bullet screen comments and the ambiguity of sentiment information.

2.2. Sentiment Analysis Based on Sentiment Lexicon

Sentiment lexicon is a tool used for analyzing and understanding the emotional tendency in texts. It usually contains a large number of words and marks the corresponding emotional polarity for each word, such as positive, negative, or neutral. In the field of natural language processing, the sentiment lexicon can be used for emotional analysis of texts such as comments on social media, product evaluations, and news reports. In addition, different fields and topics may require a specially customized sentiment lexicon to adapt to specific text analysis needs. These specially customized sentiment lexicons play an important role in the fields of text mining and natural language processing. Chedia Dhaoui et al. [8] used the method of sentiment lexicon to conduct emotion classification on the comments related to 83 luxury fashion brands on Facebook. They found through comparison that the methods based on the lexicon and machine learning are very similar in terms of the performance of sentiment analysis. Li Yang et al. [9] proposed a method combining sentiment lexicon, CNN, Gated Recurrent Unit(GRU) and attention mechanism to analyze the emotional tendency of product comments on e-commerce platforms, and used the sentiment lexicon to weight the word vectors in the input text, enhancing the emotional features in the text. At present, the mainstream Chinese sentiment lexicon includes Chinese EmoBank [10], HowNet [11], the sentiment lexicon of Dalian University of Technology [12], etc. Sentiment lexicons may not be able to cover all the vocabularies and expressions, especially the emerging Internet terms, specific-domain terms, or vocabularies with cultural specificity. Some newly emerged popular words may not be included in the traditional sentiment lexicons. In addition, the same word may have different emotional tendencies in different contexts. Relying only on the dictionary may not be able to accurately judge its emotion, and it is very likely to have incorrect judgments. Moreover, many words have multiple meanings, and it is difficult for sentiment lexicons to accurately distinguish their specific meanings and emotions in a specific context. These reasons lead to the less-than-ideal effect of sentiment lexicons in the emotional analysis of domain texts.

2.3. Sentiment Analysis Based on Deep Learning

In recent years, deep learning techniques have made remarkable progress in the field of emotion classification. Many researchers are committed to exploring different deep learning architectures and algorithms to improve the accuracy and generalization ability of emotion classification. CNN has shown strong feature extraction capabilities in emotion classification. Bonggun Shin et al. [13] proposed an emotion classification model combining a Convolutional Neural Network and sentiment lexicon. By automatically learning and extracting local features of the text, it effectively captures the short-distance dependency relationship between vocabularies. Since a Recurrent Neural Network (RNN) can connect the context to highlight the text sequence information, RNN and its variants LSTM and Gated Recurrent Unit (GRU) are also widely used in emotion classification tasks. Jin Wang et al. [14] proposed a tree-structured regional CNN-LSTM model capable of capturing local information within sentences and long-distance dependencies across sentences. Through the regional division strategy, favorable regions related to the task were discovered. Wei Li et al. [15] proposed a dual-channel CNN-LSTM and CNN-BiLSTM model combined with sentiment lexicon information. Through the sentiment padding method, the size of the input data was made consistent and the proportion of sentiment information in each comment was increased, alleviating the problem of gradient vanishing that occurs during zero padding and generating high-quality lexicon features. Mao et al. proposed generating a hybrid word embedding representation by combining the sentiment information in the sentiment lexicon with traditional word vectors. This method not only retains the interpretability of word embeddings but also combines external sentiment information, improving the accuracy of sentiment classification tasks [16].

The introduction of the attention mechanism has brought new breakthroughs to emotion classification. The emergence of pre-trained language models such as Bidirectional Encoder Representations from Transformers(BERT) [17] and Embeddings from Language Models(ELMO) [18] provides a powerful foundation for emotion classification. Yan et al. [19] proposed an attention parallel two-channel deep learning hybrid model to solve the problem that is difficult to well capture the text’s emotional features and identify the ambiguity of words in previous emotion analysis research. This model uses BERT to encode the input text, significantly improving the performance of emotion classification.

In the emotion analysis task, both the methods based on sentiment lexicon and deep learning have achieved some results. The method based on sentiment lexicon, by classifying and annotating a large number of emotional vocabularies, can relatively accurately identify the basic emotional tendency in the text. But when facing complex language expressions and contexts, there may be situations of misjudgment [20]. Deep learning methods such as CNN, LSTM, and attention mechanism models can automatically capture the deep semantic information in the text and understand those implicit and metaphorical emotional expressions. However, deep learning methods often require a large amount of labeled data and computing resources, and the interpretability of the models is relatively weak [21].

3. Methodology

This section will introduce the production process of the “Russia–Ukraine War” dataset used in the experiment and the details of each component of the model.

3.1. Data Collection and Preprocessing

The “Russia–Ukraine War” themed bullet screen dataset used in the experiments of this paper was collected and produced by us. The collection of the dataset was accomplished by Yupan Liu. The annotation work was completed by Shengshi Yu, Kai Zhang, and Tengfei Song. Each word label includes emotional polarity, emotional intensity, and emotional classification. The inter-annotator agreement (ITA) rate is around 85%.

In the experiment of this article, the dataset used is collected from the Chinese bullet-screen video website “bilibili”, with the theme of “the Russia–Ukraine War”. Using a crawler in the xml file format, bullet-screen information between 24 February 2022 and 11 April 2024 is collected. Further data processing is carried out on these bullet screens, removing the expressions, meaningless symbols and repeated meaningless words in the bullet screens. The original dataset sorted out contains a total of 14,560 bullet-screen messages in all, and these datasets are manually labeled with emotional label. In this experiment, the classification labels of emotional vocabulary from Dalian University of Technology were referenced, and the emotions of the bullet screens are divided into seven categories: “fearness”, “sadness”, “angry”, “happiness”, “disgust”, “suprise”, and “no emotion”. The proportion of various emotions in the dataset is shown in Figure 1, and it can be seen that the emotional distribution is not uniform enough. The proportions of “sadness” and “angry” are relatively small. Unbalanced data will reduce the classification performance of the model and may lead to overfitting during model training. Therefore, the data augmentation method is adopted to replace the synonyms of the existing “sadness” and “angry” samples in the original dataset to balance the proportion of sample classification. The enhanced data are sorted into the “enhanced dataset”. We have made a word cloud of the words that appear more frequently in the dataset, through which the high-frequency words in the database can be seen in a more intuitive form. The word cloud and the enhanced dataset are shown in Table 1.

3.2. Embedding Representation

BERT is a deep language model constructed based on the Transformer architecture proposed by Google in 2018. Its model architecture is shown in Figure 2, which is mainly composed of the stacking of the encoders of the Transformer. The transformers between each layer are bidirectional. By using the multi-head attention mechanism, the model can focus on different positions of the input tokens and allocate different weights according to the importance. The input of BERT consists of three layers of embeddings: Token Embedding, Segment Embedding, and Position Embedding. These three layers of embeddings are, respectively, used to represent semantics, distinguish different sentences, and represent the sequence information of tokens. Through the combination of these three layers of Embedding, BERT can understand the semantic connotation of the text more comprehensively and accurately. The main training tasks of BERT are mainly two: Masked Language Model (MLM) and Next Sentence Prediction (NSP). The MLM method uses the WordPiece-based tokenization method to split a complete word into several sub-words. When generating training samples, these separated sub-words will be randomly masked, and then the model needs to predict the masked sub-words according to the context. This way can prompt the model to learn the internal structure and semantic relationship of words and enhance the understanding ability of the language. However, in the bert-Chinese released by Google, the tokenization method does not take into account the integrity of Chinese words and still masks the granularity of characters, which may split a complete Chinese word into meaningless two characters. The Harbin Institute of Technology, in view of the characteristics of Chinese, proposed a whole-word mask (wwm) training method [22]. This method uses the model architecture of BERT, but improves the tokenization method, which uses the Harbin Institute of Technology Language Technology Platform (LTP) as the tokenization tool to mask all the Chinese characters of the same word. Not only that, in addition to the original Chinese Wikipedia in the training corpus, other data such as encyclopedias, news, and Q&A are added. This enhances the ability of the model in Chinese classification tasks. Based on the above advantages, in the experiment of this paper, RoBERTa-wwm-ext-large is selected as a part of the feature extraction model.

3.3. Improved Chinese Sentiment Lexicon

The Dalian University of Technology Sentiment Lexicon is a mainstream Chinese sentiment lexicon in use. This lexicon encompasses 27,466 sentiment words and divides all sentiment words into seven major categories, namely “Joy”, “Good”, “Anger”, “Sadness”, “Surprise”, “Fear”, and “Disgust”, with 21 subcategories by adopting a seven-classification approach. Each word has an emotion type, emotion intensity, and emotion polarity, and the emotion intensity is represented by the numbers 1, 3, 5, 7, and 9. In the experiments of this article, the classification labels refer to the seven major category division of the Dalian University of Technology Sentiment Lexicon, and the bullet screen emotions are classified into seven categories: “Fear”, “Sadness”, “Angry”, “Happiness”, “Disgust”, “Surprise”, and “No Emotion”. For the theme of “the Russia–Ukraine War”, in this experiment, specific words of the bullet screen for this theme have been supplemented and annotated, such as “鹅友” and “乌贼”, etc., and a total of 68 frequently-occurring words with differences from the common semantics have been supplemented and annotated. The input text is segmented into word groups using the Chinese word segmentation tool jieba, and the word groups are filtered using the stop word list. Then, the emotion type and emotion intensity of each word in the sentiment lexicon are compared, and the emotion intensity of each word is accumulated to calculate the final emotion score vector of the seven-category labels of the word group.

E = \{e_{1}, e_{2}, e_{3}, e_{4}, e_{5}, e_{6}, e_{7}\}

(1)

e_{i}

in the vector represents the cumulative score of one of the seven emotion classifications.

{\hat{e}}_{i} = \frac{e_{i} - e_{m i n}}{e_{m a x} - e_{m i n}}

(2)

In order to conduct weighted calculation with the vector outputted by softmax in the subsequent step, it is necessary to map the final emotion score vector to the range between 0 and 1 by using the maximum and minimum normalization method. In Formula (2),

{\hat{e}}_{i}

represents the data value after normalization, and

e_{m i n}

and

e_{m a x}

are the maximum and minimum values in the vector. After the maximum and minimum normalization,

V = \{{\hat{e}}_{1}, {\hat{e}}_{2}, {\hat{e}}_{3}, {\hat{e}}_{4}, {\hat{e}}_{5}, {\hat{e}}_{6}, {\hat{e}}_{7}\}

can be obtained.

3.4. Multi-Scale Feature Extraction

The model in this article’s experiment incorporates multi-scale feature extraction, which is achieved by means of three groups of convolution kernels with different sizes. The sizes of the three groups of convolution kernels are 3 × 3, 5 × 5, and 7 × 7, respectively. The convolution operation aims to capture local features in the text; setting multiple convolution kernels can help reduce the contingency in feature extraction. The output of RoBERTa-wwm serves as the input of the CNN, and with these convolution kernels, local features can be well extracted from the sequence matrix outputted by RoBERTa-wwm, and the calculation formula for the feature value is as shown in Formula (3).

C_{i} = R e L u (W \cdot X_{i : i + h - 1} + b)

(3)

where

C_{i}

is the convolution output at position i, W is the convolution kernel weight, b is the bias term,

X_{i : i + h - 1}

represents the submatrix of the embedding matrix from position i to

i + h - 1

. ReLU serves as the activation function. The feature values are obtained after passing through the activation function, then the output from ReLu serves as the input for the pooling layer, and the formulas are as shown in (4) and (5).

c = \{c_{1}, c_{2}, \dots, c_{n - h + 1}\}

(4)

\hat{c} = m a x (c)

(5)

c represents the set of feature values, and

\hat{c}

is the feature value output after the maximum pooling. Then, all the feature values are concatenated to obtain the final feature vector M, as shown in Formula (6).

M = \{{\hat{c}}_{1}, {\hat{c}}_{2}, \dots, {\hat{c}}_{n - h + 1}\}

(6)

The obtained feature vector M is input into the fully connected layer, and through the transformation in the feature space, the feature vector M can be mapped to the vector R in the emotion classification category space. Finally, the feature vector R is converted into the probability vector P through the softmax function, as shown in Formula (7).

P = s o f t m a x (R)

(7)

The obtained probability vector P and the emotion score vector V obtained from the sentiment lexicon are weighted and summed to obtain the final emotion classification probability vector F, as shown in Formula (8).

F = μ P \oplus (1 - μ) V

(8)

μ

is a manually set weight hyperparameter. In this article’s experiment, the bullet screen lengths are divided into three groups: [0, 5), [5–15), and [15, 25]. According to the different lengths, the values of

μ

are set as 0.2, 0.7, and 0.9, respectively, and different weights are assigned to the sentiment lexicon and RoBERTa-wwm.

4. Experiments and Results Analysis

To verify the feasibility and reliability of the proposed method for the task of bullet screen sentiment analysis, in this section, we describe the model architecture used in the experiment, the settings of hyperparameters in the model, the evaluation metrics of the experiment, the experimental results and the analysis of the results.

4.1. Experiment Design

Summarizing the above methods, we have proposed a method that combines the sentiment lexicon, the multi-scale feature extraction and RoBERTa-wwm to capture the bullet screen emotion information of the “Russia–Ukraine War” theme. The overall structure of the model is shown in Figure 3.

To ensure the validity of the experimental results, we randomly divide the bullet screen dataset and conduct multiple cross-validations. We randomly divided the dataset into five subsets of approximately equal size, with each subset containing approximately 4000 samples. A total of five iterative validations were conducted. In each iteration, one of the subsets was selected as the validation set, and the remaining four subsets were combined as the training set. The validation set was used to evaluate the performance of the model, and the results of these five evaluations were combined and averaged to obtain the final performance evaluation of the model. Finally, the average value of the five emotion classification results is taken for the final experimental result to reduce the interference of errors. The training parameter settings of the RoBERTa-wwm model are shown in Table 2, and the cross-entropy loss function is selected as the error evaluation.

In this experiment, the learning rate, weight allocation value

μ

, and batch size were selected as the hyperparameters to be adjusted. The value range of each hyperparameter was set based on prior experience and preliminary experimental results as follows: The range of the learning rate was [

2^{- 5}

,

2^{- 3}

], the batch size took values of [32, 64, 128], and the weight allocation value

μ

was [0.2, 0.7, 0.9], [0.5, 0.5, 0.5], [0.9, 0.7, 0.2]. The grid search will traverse all possible parameter combinations, and a total of 3 × 3 × 3 = 27 trainings will be conducted.

The experimental results indicate that the choice of batch size has little influence on the model accuracy. This might be because in this specific task, the model is not very sensitive to the change of batch size. The possible reasons are that the model itself has a certain stability, or the nature of the task makes the change of batch size not have a significant impact on the model’s learning process. The best-performing combination is u as [0.2, 0.7, 0.9], batch size as 32, and learning rate as

2^{- 5}

. This combination might perform well in balancing different parts of the model, taking advantage of the randomness of the data, and controlling the learning speed. Specifically, the weight assignment values of [0.2, 0.7, 0.9] might better coordinate different loss terms or model structures, a batch size of 32 might find a good balance between randomness and training efficiency, and the learning rate

2^{- 5}

might both ensure the rapid learning of the model and avoid excessive oscillation.

4.2. Evaluation Indicators

In order to verify the performance of different classification methods, all classification methods are evaluated based on the confusion matrix. Among them, there are the following: true positive (

T P

), false positive (

F P

), true negative (

T N

), and false negative (

F N

). Three multi-category task evaluation indicators are used as follows: weighted precision (

W P

), weighted recall (

W R

), and weighted

F_{1}

(

W F_{1}

).

W P = \frac{1}{N} \sum_{i = 1}^{m} n_{i} \frac{T P_{i}}{T P_{i} + F P_{i}}

(9)

W R = \frac{1}{N} \sum_{i = 1}^{m} n_{i} \frac{T P_{i}}{T P_{i} + F N_{i}}

(10)

W F = \frac{2 P_{i} R_{i}}{P_{i} + R_{i}}

(11)

In the above formula, N is the total number of samples; m is the total number of categories;

n_{i}

is the number of samples in the ith category;

T P_{i}

,

F P_{i}

, and

F N_{i}

are the model samples of positive classification, positive misclassification, and negative misclassification in the ith category, respectively;

P_{i}

and

R_{i}

are the accuracy P and recall rate R of the ith category, respectively.

4.3. Experimental Results

The experimental results of the model proposed in this article are shown in Table 3. It can be seen from Table 3 that the RoBERTa-CNN model fused with the sentiment lexicon proposed by us has achieved good results in multiple evaluation indicators of the bullet screen emotion classification task. Compared to the single RoBERTa-wwm, the model in this article has improved by 4% in weighted precision, indicating the effectiveness of the multi-scale convolutional local feature extraction of CNN. We can intuitively see the comparison of the model effects from Figure 4, the RoBERTa+CNN model using the sentiment lexicon has a 1.91% higher weighted precision than without using the sentiment lexicon, indicating that the sentiment lexicon module can better capture the emotional characteristics of vocabulary in specific fields. We also compared the commonly used emotion classification models such as RoBERTa+RCNN, Regional CNN+LSTM, ERNIE+BiLSTM and RoBERTa+BiLSTM. Compared to RoBERTa+RCNN, Regional CNN+LSTM, ERNIE+BiLSTM and RoBERTa+BiLSTM, the model proposed in this article has 2.35%, 2.1%, 0.24% and 3.27% higher in weighted precision, respectively. Compared to the Word2Vec+CNN model, the model in this article has improved by 2.75% in weighted precision, indicating that using the RoBERTa-wwm pre-trained model can better capture the context information of words and improve the performance of emotion classification.

In the experiment, it was found that some texts have ambiguous semantics. For example, in a sentence like “Russia has performed well, but also exposed many shortcomings”, the sentiment tendency is not clear. The error caused by this semantic ambiguity is qualitatively difficult to determine because it is difficult for the model to accurately judge whether such a text with mixed sentiments is positive or negative. These texts often contain a mixture of multiple viewpoints, increasing the difficulty for the model to accurately classify. The model is prone to errors when dealing with some newly emerged words that are not in the sentiment lexicon. This is because these words may not be included in the model’s vocabulary or have not been fully learned in the pre-training process in terms of their sentiment meanings. This error is caused by the limitation of the model’s vocabulary coverage and qualitatively manifests as the lack of processing ability for specific words. For example, for the newly emerged word “烤鹅” in the topic of the Russia–Ukraine war, if the model does not update the vocabulary and related sentiment learning, it is very difficult to accurately judge the sentiment tendency of the sentence containing this word.

5. Discussion

In this paper, aiming at the bullet screens with the theme of “the Russia–Ukraine War”, an emotion analysis method combining the sentiment lexicon and the multi-scale feature extraction of RoBERTa-CNN is proposed. This method combines the powerful semantic representation ability of RoBERTa, the local feature extraction ability of CNN, and the prior knowledge of the sentiment lexicon, which utilizes the strengths of each component, and improves the effectiveness of sentiment analysis. Considering the characteristics of different lengths of bullet screen texts and many new words, strategies such as data augmentation and length-based weighting have been effectively employed to address these challenges, making the model achieve better results in the task of bullet screen sentiment analysis. Compared with a variety of comparison models, the model proposed in this paper has achieved significant improvements in evaluation indicators such as weighted precision rate, weighted recall rate, and weighted F1 value, demonstrating the effectiveness and superiority of this method. There are still some limitations and possible improvement directions in this article’s experiment. Although we have improved and supplemented the sentiment lexicon, there may still be situations where not all vocabularies and expressions can be covered. In the future, we can further explore how to construct a more comprehensive and accurate sentiment lexicon, or combine other knowledge graphs and other resources to enhance the ability of emotion analysis. What can be discussed is whether there can be a better method to obtain the weight distribution between the sentiment lexicon and the pre-trained model. The experiment in this article is carried out for the bullet screen data with the theme of “the Russia–Ukraine War”, and the adaptability of the model in other fields or themes still needs to be further verified. In the future, more cross-field experiments can be carried out to evaluate the universality and generalization ability of the model.

6. Conclusions

This study proposed an emotion analysis method integrating the sentiment lexicon and multi-scale feature extraction of RoBERTa-CNN for “the Russia–Ukraine War” themed bullet screens. The method effectively combines RoBERTa’s semantic representation, CNN’s local feature extraction, and the prior knowledge of the sentiment lexicon, achieving enhanced sentiment analysis effectiveness. Through strategies like data augmentation and length-based weighting, the model overcomes challenges related to varying text lengths and new words in bullet screens, attaining significant improvements in weighted precision rate, weighted recall rate, and weighted F1 value compared to other models. The optimal weight distribution between the lexicon and pre-trained model remains a topic for further exploration. Future research should focus on constructing a more comprehensive sentiment lexicon, perhaps by integrating other knowledge graphs, and conducting more cross-field experiments to enhance the model’s universality and generalization ability.

Author Contributions

Y.L. Responsible for building model architectures, writing papers, code implementation, pictures and tables. S.Y. Responsible for server debugging and maintenance. S.W. Corresponding author, responsible for providing ideas and solutions to problems, and refining and revising the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the grants from the Natural Science Foundation of Hebei Province (F2021201055) and the Innovation Capacity Enhancement Program-Science and Technology Platform Project of Hebei Province (22567623H).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this article are not readily available because the data are part of an ongoing study. Requests to access the datasets should be directed to [email protected].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xu, Y.; Wang, B.; Huang, J.; Liu, S. Natural language processing in “bullet screen” application. In Proceedings of the 2017 International Conference on Service Systems and Service Management, Dalian, China, 31 July 2017; pp. 1–6. [Google Scholar]
Gupta, A.; Tyagi, M.; Sharma, D. Use of social media marketing in healthcare. J. Health Manag. 2013, 15, 293–302. [Google Scholar] [CrossRef]
Georgiadou, E.; Angelopoulos, S.; Drake, H. Big data analytics and international negotiations: Sentiment analysis of Brexit negotiating outcomes. Int. J. Inf. Manag. 2020, 51, 102048. [Google Scholar] [CrossRef]
Catelli, R.; Pelosi, S.; Comito, C.; Pizzuti, C.; Esposito, M. Lexicon-based sentiment analysis to detect opinions and attitude towards COVID-19 vaccines on Twitter in Italy. Comput. Biol. Med. 2023, 158, 106876. [Google Scholar] [CrossRef] [PubMed]
Zeng, C.; Wen, C.; Sun, Y.; Pan, L.; He, P. Bullet screen text emotion analysis based on ALBERT-CRNN. J. Zhengzhou Univ. Sci. Ed. 2021, 53, 1–8. [Google Scholar]
Zhuang, X.; Liu, F. Bullet screen comment emotion analysis based on AT-LSTM. Digit. Technol. Appl. 2018, 36, 210–212. [Google Scholar]
Hsieh, Y.H.; Zeng, X.P. Sentiment analysis: An ERNIE-BiLSTM approach to bullet screen comments. Sensors 2022, 22, 5223. [Google Scholar] [CrossRef]
Dhaoui, C.; Webster, C.M.; Tan, L.P. Social media sentiment analysis: Lexicon versus machine learning. J. Consum. Mark. 2017, 34, 480–488. [Google Scholar] [CrossRef]
Yang, L.; Li, Y.; Wang, J.; Sherratt, R.S. Sentiment analysis for E-commerce product reviews in Chinese based on sentiment lexicon and deep learning. IEEE Access 2020, 8, 23522–23530. [Google Scholar] [CrossRef]
Lee, L.H.; Li, J.H.; Yu, L.C. Chinese EmoBank: Building valence-arousal resources for dimensional sentiment analysis. Trans. Asian Low-Resour. Lang. Inf. Process. 2022, 21, 1–18. [Google Scholar] [CrossRef]
Liu, Y.; Qi, F.; Liu, Z.; Sun, M. Research on consistency check of sememe annotations in HowNet. J. Chin. Inf. Process. 2020, 35, 23–34. [Google Scholar]
Xu, L.; Lin, H.; Pan, Y.; Ren, H.; Chen, J. The construction of emotional vocabulary ontology. J. China Soc. Sci. Tech. Inf. 2008, 27, 180–185. [Google Scholar]
Shin, B.; Lee, T.; Choi, J.D. Lexicon integrated CNN models with attention for sentiment analysis. arXiv 2016, arXiv:1610.06272. [Google Scholar]
Wang, J.; Yu, L.C.; Lai, K.R.; Zhang, X. Tree-structured regional CNN-LSTM model for dimensional sentiment analysis. IEEE/ACM Trans. Audio Speech Lang. Process. 2019, 28, 581–591. [Google Scholar] [CrossRef]
Li, W.; Zhu, L.; Shi, Y.; Guo, K.; Cambria, E. User reviews: Sentiment analysis using lexicon integrated two-channel CNN–LSTM family models. Appl. Soft Comput. 2020, 94, 106435. [Google Scholar] [CrossRef]
Mao, X.; Chang, S.; Shi, J.; Li, F.; Shi, R. Sentiment-aware word embedding for emotion classification. Appl. Sci. 2019, 9, 1334. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Peng, Y.; Yan, S.; Lu, Z. Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets. arXiv 2019, arXiv:1906.05474. [Google Scholar]
Yan, C.; Liu, J.; Liu, W.; Liu, X. Research on public opinion sentiment classification based on attention parallel dual-channel deep learning hybrid model. Eng. Appl. Artif. Intell. 2022, 116, 105448. [Google Scholar] [CrossRef]
Xu, G.; Yu, Z.; Yao, H.; Li, F.; Meng, Y.; Wu, X. Chinese text sentiment analysis based on extended sentiment dictionary. IEEE Access 2019, 7, 43749–43762. [Google Scholar] [CrossRef]
Dang, N.C.; Moreno-García, M.N.; De la Prieta, F. Sentiment analysis based on deep learning: A comparative study. Electronics 2020, 9, 483. [Google Scholar] [CrossRef]
Cui, Y.; Che, W.; Liu, T.; Qin, B.; Yang, Z. Pre-training with whole word masking for chinese bert. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 3504–3514. [Google Scholar] [CrossRef]

Figure 1. The proportion of emotional classification in the dataset, and the word cloud of the dataset (The high-frequency words in the word cloud mainly include the relevant countries, such as Ukraine, Russia, and the United States. And comments related to the Russia-Ukraine war).

Figure 2. The structure of BERT.

Figure 3. The overall structure of the model.

Figure 4. The results of the sentiment lexicon-RoBERTa-CNN.

Table 1. The number of samples in the dataset.

Dataset	Train Set	Valid Set	Test Set
original dataset	14,560	2000	2000
enhanced dataset	21,760	2500	2500

Table 2. Parameters Setting.

Parameter	Value
Batchsize	32
Learning rate	$2 \times 10^{- 5}$
Epoch	15
Dropout	0.4
Optimizer	Adam

Table 3. The results of the sentiment lexicon-RoBERTa-CNN.

	Model	WP/%	WR/%	WF/%
no sentiment lexicon	Word2Vec+CNN	80.42	80.22	79.81
	RoBERTa	79.21	76.58	74.50
	RoBERTa+RCNN	80.83	80.79	80.77
	Regional CNN-LSTM	80.98	80.83	80.77
	RoBERTa+BiLSTM	80.05	79.81	79.83
	ERNIE+BiLSTM	82.88	82.51	82.00
	RoBERTa+CNN	83.21	82.87	82.31
using the sentiment lexicon	Word2Vec+CNN	82.37	82.19	81.77
	RoBERTa	81.12	78.46	76.35
	RoBERTa+RCNN	82.77	82.65	82.53
	Regional CNN-LSTM	83.02	82.86	82.70
	RoBERTa+BiLSTM	81.85	81.76	81.69
	ERNIE+BiLSTM	84.88	84.29	83.80
	RoBERTa+CNN	85.12	84.64	84.19

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Wang, S.; Yu, S. A Bullet Screen Sentiment Analysis Method That Integrates the Sentiment Lexicon with RoBERTa-CNN. Electronics 2024, 13, 3984. https://doi.org/10.3390/electronics13203984

AMA Style

Liu Y, Wang S, Yu S. A Bullet Screen Sentiment Analysis Method That Integrates the Sentiment Lexicon with RoBERTa-CNN. Electronics. 2024; 13(20):3984. https://doi.org/10.3390/electronics13203984

Chicago/Turabian Style

Liu, Yupan, Shuo Wang, and Shengshi Yu. 2024. "A Bullet Screen Sentiment Analysis Method That Integrates the Sentiment Lexicon with RoBERTa-CNN" Electronics 13, no. 20: 3984. https://doi.org/10.3390/electronics13203984

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A Bullet Screen Sentiment Analysis Method That Integrates the Sentiment Lexicon with RoBERTa-CNN

Abstract

1. Introduction

2. Related Works

2.1. Sentiment Analysis of Social Media Text

2.2. Sentiment Analysis Based on Sentiment Lexicon

2.3. Sentiment Analysis Based on Deep Learning

3. Methodology

3.1. Data Collection and Preprocessing

3.2. Embedding Representation

3.3. Improved Chinese Sentiment Lexicon

3.4. Multi-Scale Feature Extraction

4. Experiments and Results Analysis

4.1. Experiment Design

4.2. Evaluation Indicators

4.3. Experimental Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI