EmoBERTa-X: Advanced Emotion Classifier with Multi-Head Attention and DES for Multilabel Emotion Classification
Abstract
:1. Introduction
- Enhanced DES Framework: We propose enhancements to the internal structure of the DES framework by optimizing the handling of the complexities typical in short and ambiguous texts found in social media.
- Integration with a multi-head Attention Mechanism: It extends the attention mechanism to enhance model focuses on relevant emotional cues within the text, which is very useful in the case of multilabels.
- Advanced Preprocessing: This includes new preprocessing steps, such as abbreviation expansion and enhancement in the context of embeddings, which cope better with the informality of the language.
2. Applied Methodology
2.1. Advanced Preprocessing Techniques
2.1.1. Abbreviation and Slang Expansion Module
2.1.2. Context-Sensitive Embedding Refinement
2.1.3. Token Variation and Noise Reduction
- Synonym Replacement and Token Normalization: TVNR incorporates synonym replacement and token normalization rules. Variants in spelling and loose usages like “gr8” for “great” are replaced with their normalized forms.
- Augmentation Strategies: TVNR also performs token-level augmentations like random insertion and synonym replacement during training to increase its robustness. These augmentations will allow the model to generalize better in scenarios where there are token variations it has not yet seen at test time.
- Noise Filtering: The module further tries to reduce the impact of typos or characters irrelevant to the context, such as inappropriate punctuation or emojis, by filtering them out or replacing these types of noise with tokens contextually appropriate.
2.1.4. Handling Text Length and Padding (HTLP)
2.2. Comprehensive Model Framework
- denotes the complete set of available classifiers,
- represents the subset C of classifiers within the ensemble that maximizes the weighted sum.
- is a weight parameter assigned to each classifier c, learned through the meta-learning process to reflect its effectiveness in the current context,
- is the output from classifier c for the input instance x,
2.3. Integration of Multi-Head Attention with EmoBERTa-X
- Attention Mechanism Configuration: The multi-head attention is defined with an embedding dimension that is specified to match RoBERTa’s hidden size. These h parallel attention heads compute the attention scores for different parts of the input sequence and work simultaneously. This parallelism helps the model capture diverse linguistic patterns along with emotional cues present in social media text.
- Mathematical Formulation: The attention mechanism computes the scores using the query, key, and value matrices derived from the hidden states output by RoBERTa. For each attention head i, the attention scores are computed as [38]:
- represents the output of the i-th attention head after applying the softmax function to normalize the scores,
- The softmax function normalizes the attention scores, ensuring that they represent the relevance of each token in the sequence relative to the others,
- , , and are the query, key, and value projections for the i-th attention head,
- is the dimension of the key vectors.
- Combining Attention Heads:The outputs from all attention heads are concatenated to provide a unified representation. This concatenation itself summarizes the evidence provided by each head; hence, the model can jointly mesh various emotional and contextual features at once for the input sequence. Then, this combined output undergoes a linear transformation by a weight matrix to align it with the original embedding dimension [38].This transformation combines the various pieces of information that each head captures into a single representation that carries syntactic and semantic information in an integrated way.
- Dropout and Pooling Operations:A dropout layer to the output of the multi-head attention mechanism is applied as a regularization technique to prevent overfitting and improve the generalizability of this model. It works by randomly disabling a fraction of the units in the attention during training, ensuring none of these pathways are relied upon too heavily. Afterward, the model performs an average pooling along the sequence dimension. This combines the information from the attended representations through averaging of token representations to emphasize the most captured emotional signals by the attention heads into one pooled vector suitable for classification.
- Final Classification Layer:This pooled output now acts as a compact and enriched representation of the sequence, which is fed into a fully connected classification layer. Comprising a linear transformation, this layer maps the pooled vector to the emotion labels; hence, the model emits a prediction based on the attended information. Using BCEWithLogitsLoss as the loss function ensures that the model works on multilabel classification, considering each label of emotion as a different binary classification.
2.4. Advanced Modification of the DES Framework
2.4.1. Context-Sensitive Classifier Selection
- N is the number of samples.
- L is the number of labels.
- and are the true and predicted labels (1 or 0) for the j-th label of the i-th sample.
- is an indicator function that is 1 if the argument is true and 0 otherwise.
2.4.2. Multi-Contextual Selector Module
- Text Length: longer texts may provide more emotional context, while shorter texts require more focused attention.
- Ambiguity: The use of informal language, slang, and abbreviations normally creates ambiguities in depicting emotions. The MCSM evaluates the degree of ambiguity using a custom Ambiguity Coefficient (AC) [40]:
- -
- represents the number of words in the input text x,
- -
- measures the average semantic complexity of the words based on their embeddings and contextual information,
- -
- is a small constant to avoid division by zero.
2.4.3. Weighted Balanced Sampling
- Balanced Batch Construction: By construction, each training batch is formed by sampling examples in such a way that the distribution of all classes in the batch is much more balanced. Underrepresented classes weigh more highly, with a higher probability of being sampled, thus becoming much more frequent in training batches.
- Model Diversity and Learning: Weighted balanced sampling across all constituent models of the ensemble implies that the DES framework capitalizes on models that have learned to refer to both common and rare classes correctly. Indeed, this introduces a positive combination effect in the polls.
2.4.4. Competence-Based Weighting and Aggregation
- represents the prediction of classifier c,
- is the competence-based weight assigned to classifier c,
- is the final prediction obtained by averaging the weighted outputs of the classifiers.
2.5. Model Evaluation
- , , and are the true positives, false positives, and false negatives for label j, respectively,
- is the number of true instances for label j.
3. Experimentation and Results
3.1. GoEmotions Dataset
3.1.1. Label Mapping
- Labels such as anger and annoyance were grouped under the Anger category.
- Sadness and related emotions like disappointment were mapped to the Sadness category.
- Positive emotions like happiness, amusement, and excitement were categorized under Joy.
3.1.2. GoEmotions Preprocessing
3.2. Experimental Work
3.2.1. Handling Overlapping Emotions
- “I can’t believe I finally did it! I’m so proud and happy”
- -
- Ground Truth: Joy, Pride
- -
- Model Prediction (without DES): Joy
- -
- Model Prediction (with DES): Joy, Pride
3.2.2. Ablation Study
- Experiment 1: Baseline RoBERTaThe goal of this first experiment was to establish the baseline performance by fine-tuning RoBERTa with no change in the layers of the model and the dataset. This provided a starting point in understanding the model’s capability to handle multilabel emotion classification.The baseline model gave an accuracy of 65.94%, the micro F1-score was 67.07%, the macro F1-score was 60.55%, and Hamming loss at 0.0935. These metrics were able to provide the initial view of model performance about both general accuracy and the balance between precision and recall.This was a reasonable baseline performance, but it hinted at some limitations with respect to properly modeling multilabel classification tasks, particularly when it comes to aiming for high recall on the less frequent emotion labels. This setup pointed to the need for further modifications to enhance the model’s handling of complicated and multilabel data and increase generalization across diverse emotional expressions.
- Experiment 2: RoBERTa plus AttentionIn this experiment, multi-head attention would be added to RoBERTa with 8 attention heads, such that it could pay more attention to the most relevant parts of the input, which accordingly helps the model grasp and elevate salient features of the text, thus maximizing its performance in multilabel classification.Apart from that, this modification yielded an accuracy improvement to reach 66.14%, 67.23% Micro F1-score, 60.37% Macro F1-score, and Hamming loss value 0.0929.These results showed that attention introduced an improvement, demonstrated by the increase in the model’s attention to key textual features.
- Experiment 3: DES using RoBERTa plus AttentionThe experiment was designed to test the integration of DES in EmoBERTA-X to choose the ’best’ subset of models dynamically during an inference. The objective was to control DES in a way that would use its better adaptability and make more intelligent decisions based on input variability.After training four instances of the model, it is reflected in this experiment that the performance accuracy was enhanced significantly by 73.79%, the micro F1-score was 75.05%, the macro F1-score was 69.08%, and the Hamming loss was 0.0703.This significantly enhances the reliability of the model, as it dynamically allows DES to select which models are most relevant for any given input. The adaptiveness results in overall boosting performance, especially in complex multilabel tasks that underline the power of ensemble strategies in boosting performance.
- Experiment 4: DES using RoBERTa plus Attention applying ASEMThe experiment is to embed abbreviation expansion into the preprocessing step to normalize informal language and make it somewhat more interpretable for the model. This will make the model understand colloquial expressions that might be present in text data.With abbreviation expansion added to the model, the results slightly improved as well: accuracy at 73.85%, a micro F1-score of 75.04%, while maintaining Hamming loss at 0.0704.This suggested that an improved handling of abbreviations helped the model further tune its knowledge of informal language, resulting in improved classification outcomes, particularly when abbreviations would have otherwise made those instances harder to classify.
- Experiment 5: DES using RoBERTa plus Attention applying ASEM and Emoji ConversionThis experiment introduced the conversion of emojis into the preprocessing pipeline to represent the emotions carried by non-verbal symbols. The conversion was expected to introduce depth in context to the texts, allowing the model to pick up those subtler emotions that were missed earlier. Converting emojis also dropped performance a bit. Now, accuracy is 72.80%, micro F1-score is 74.25%, and Hamming loss is 0.0725.However, the results showed that, in this setup, emojis can add context only to a limited degree, thus providing minor improvements for the model. This could mean that poorly represented emoji data may have little impact on classification accuracy.
- Experiment 6: DES using RoBERTa plus Attention applying ASEM and TVNRThe goal of this experiment was to further refine preprocessing by adding contraction replacements such as “can’t” to “cannot”. This is expected to serve the dual purpose of enhancing the model’s parsing of text with complex structures and improving its interpretation of sentence forms in various ways.It resulted in the following: an accuracy of 73.73%, a micro F1-score of 75.01%, and a Hamming loss of 0.0704. Although the addition of contraction replacement did not introduce a significant jump in performance, the experiment helped maintain the model’s effectiveness by increasing its understanding of text where contractions were present. This showed that fine-tuning preprocessing can make it possible for the model to deal with real-world text variation.
- Experiment 7: Addition of WBS during TrainingThe last experiment tackles the class imbalance problem at training time with weighted balanced sampling. In such a way, it was ensured that the model received more balanced examples of all classes (special attention was given to the weaker classes) so that it could generalize better.Among all the previous experiments, the best performance was 75.52%, 76.10% for the micro F1-score, 70.13% for the macro F1-score, and the lowest Hamming loss is 0.0679. Weighted balanced sampling significantly improved the model’s recall towards the classes with low frequency, and its generalization was superior while performing on par for all classes. This result has clearly established class balancing as an integral part of top-performing multilabel classification.
- Class Imbalance: As deduced from Figure 4, this may result in a model biased toward the more frequent labels performing poorly when generalizing to the less frequent emotions. However, WBS was used to sample the underrepresented emotions more during training. This ensures a better generalization of the model for both frequent and rare emotion labels as shown in the results achieved in Table 5. This table compares recall and precision scores for frequent and rare emotion classes with and without applying WBS. The results show that WBS significantly improves recall for rare emotions, such as disgust (+5.8) and fear (+2.1), while also enhancing precision. For frequent emotions, such as joy and neutrality, performance is maintained or slightly improved, with notable gains in precision (+1.5 for joy and +6.6 for neutrality). This highlights WBS’s effective applicability to multilabel emotion-classification tasks, especially in addressing severe class imbalance.
- Short Texts and Ambiguity: Many posts in GoEmotions are of limited length, and many times ambiguous, hence difficult to capture the intended emotional content. However, by taking advantage of the features of EmoBERTa-X with Multi-head Attention Mechanisms, it can better disambiguate the meaning of short texts and extract their emotional context more precisely.
- Multilabel Classification Complexity: An important challenge of the GoEmotions dataset is that emotion classification is multilabeled, which means one instance can have more than one emotion; this was successfully handled by presenting DES.
- Handling Informal Language: The social media dataset contains a lot of informal language, slang, abbreviations, and nonstandard grammar. Abbreviation expansion allows better handling of informal expressions. This preprocessing, along with the expansion in EmoBERTa-X, allows the model to better interpret slang and informal languages in this dataset.
3.3. Comparison to the State-of-the-Art Models
Model | Accuracy (%) | Micro F1 (%) | Macro F1 (%) | Key Observation |
---|---|---|---|---|
UCCA-GAT [43] | 71.2 | 75.4 | 63.9 | Lower macro F1 highlights poor handling of rare emotions. |
Dep-GAT [43] | 68.7 | 74.7 | 61.1 | Lacks robust contextual adaptation. |
BERT [45] | - | - | 64.0 | Basic transformer model without optimizations. |
RoBERTa [44] | 65.9 | 69.1 | 61.8 | Improved over BERT but struggles with multilabel tasks. |
Dim-RoBERTa [44] | 65.7 | 68.6 | 61.0 | Employs dimensionality reduction, enhancing efficiency but still struggles with rare emotional categories and overlapping emotions. |
Proposed EmoBERTa-X | 75.5 | 76.1 | 70.1 | Excels in handling informal text, balances performance across all emotion classes with dynamic ensemble selection. |
4. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- García-Hernández, R.A.; Luna-García, H.; Celaya-Padilla, J.M.; García-Hernández, A.; Reveles-Gómez, L.C.; Flores-Chaires, L.A.; Delgado-Contreras, J.R.; Rondon, D.; Villalba-Condori, K.O. A Systematic Literature Review of Modalities, Trends, and Limitations in Emotion Recognition, Affective Computing, and Sentiment Analysis. Appl. Sci. 2024, 14, 7165. [Google Scholar] [CrossRef]
- Hanna, R.; Rohm, A.; Crittenden, V.L. We’re all connected: The power of the social media ecosystem. Bus. Horizons 2011, 54, 265–273. [Google Scholar] [CrossRef]
- Tawfik, A.; Elkhodary, H.O.; Saleh, S.N. A Deep Learning-based Emotion Recognition System for Interactive E-Learning. In Proceedings of the 2022 32nd International Conference on Computer Theory and Applications (ICCTA), Alexandria, Egypt, 17–19 December 2022; pp. 38–43. [Google Scholar] [CrossRef]
- Brynielsson, J.; Johansson, F.; Jonsson, C.; Westling, A. Emotion classification of social media posts for estimating people’s reactions to communicated alert messages during crises. Secur. Inform. 2014, 3, 7. [Google Scholar] [CrossRef]
- Sharma, A.; Sharma, K.; Kumar, A. Real-time emotional health detection using fine-tuned transfer networks with multimodal fusion. Neural Comput. Appl. 2023, 35, 22935–22948. [Google Scholar] [CrossRef]
- Kusal, S.; Patil, S.; Kotecha, K.; Aluvalu, R.; Varadarajan, V. AI Based Emotion Detection for Textual Big Data: Techniques and Contribution. Big Data Cogn. Comput. 2021, 5, 43. [Google Scholar] [CrossRef]
- Mansoor, M.A.; Ansari, K.H. Early Detection of Mental Health Crises through Artifical-Intelligence-Powered Social Media Analysis: A Prospective Observational Study. J. Pers. Med. 2024, 14, 958. [Google Scholar] [CrossRef]
- Asghar, M.Z.; Khan, A.; Bibi, A.; Kundi, F.M.; Ahmad, H. Sentence-level emotion detection framework using rule-based classification. Cogn. Comput. 2017, 9, 868–894. [Google Scholar] [CrossRef]
- Berka, P. Sentiment analysis using rule-based and case-based reasoning. J. Intell. Inf. Syst. 2020, 55, 51–66. [Google Scholar] [CrossRef]
- Wang, L.; Isomura, S.; Ptaszynski, M.; Dybala, P.; Urabe, Y.; Rzepka, R.; Masui, F. The Limits of Words: Expanding a Word-Based Emotion Analysis System with Multiple Emotion Dictionaries and the Automatic Extraction of Emotive Expressions. Appl. Sci. 2024, 14, 4439. [Google Scholar] [CrossRef]
- Öhman, E. The validity of lexicon-based sentiment analysis in interdisciplinary research. In Proceedings of the Workshop on Natural Language Processing for Digital Humanities, Silchar, India, 16–19 December 2021; pp. 7–12. [Google Scholar]
- Nandwani, P.; Verma, R. A review on sentiment analysis and emotion detection from text. Soc. Netw. Anal. Min. 2021, 11, 81. [Google Scholar] [CrossRef] [PubMed]
- Sujanaa, J.; Palanivel, S.; Balasubramanian, M. Emotion recognition using support vector machine and one-dimensional convolutional neural network. Multimed. Tools Appl. 2021, 80, 27171–27185. [Google Scholar] [CrossRef]
- Semary, N.A.; Ahmed, W.; Amin, K.; Pławiak, P.; Hammad, M. Enhancing machine learning-based sentiment analysis through feature extraction techniques. PLoS ONE 2024, 19, e0294968. [Google Scholar] [CrossRef] [PubMed]
- Sarsam, S.M.; Al-Samarraie, H.; Alzahrani, A.I.; Wright, B. Sarcasm detection using machine learning algorithms in Twitter: A systematic review. Int. J. Mark. Res. 2020, 62, 578–598. [Google Scholar] [CrossRef]
- Bouazizi, M.; Ohtsuki, T.O. A pattern-based approach for sarcasm detection on twitter. IEEE Access 2016, 4, 5477–5488. [Google Scholar] [CrossRef]
- Shiri, F.M.; Perumal, T.; Mustapha, N.; Mohamed, R. A comprehensive overview and comparative analysis on deep learning models: CNN, RNN, LSTM, GRU. arXiv 2023, arXiv:2305.17473. [Google Scholar]
- Iyer, A.; Das, S.S.; Teotia, R.; Maheshwari, S.; Sharma, R.R. CNN and LSTM based ensemble learning for human emotion recognition using EEG recordings. Multimed. Tools Appl. 2023, 82, 4883–4896. [Google Scholar] [CrossRef]
- Chen, M. Emotion analysis based on deep learning with application to research on development of Western culture. Front. Psychol. 2022, 13, 911686. [Google Scholar] [CrossRef]
- Bodapati, S.; Bandarupally, H.; Shaw, R.N.; Ghosh, A. Comparison and analysis of RNN-LSTMs and CNNs for social reviews classification. In Advances in Applications of Data-Driven Computing; Springer: Singapore, 2021; pp. 49–59. [Google Scholar]
- Liu, N.; Ren, F. Emotion classification using a CNN_LSTM-based model for smooth emotional synchronization of the humanoid robot REN-XIN. PLoS ONE 2019, 14, e0215216. [Google Scholar] [CrossRef]
- Acheampong, F.A.; Nunoo-Mensah, H.; Chen, W. Transformer models for text-based emotion detection: A review of BERT-based approaches. Artif. Intell. Rev. 2021, 54, 5789–5829. [Google Scholar] [CrossRef]
- Rezapour, M. Emotion Detection with Transformers: A Comparative Study. arXiv 2024, arXiv:2403.15454. [Google Scholar]
- Ganie, A.G. Presence of informal language, such as emoticons, hashtags, and slang, impact the performance of sentiment analysis models on social media text? arXiv 2023, arXiv:2301.12303. [Google Scholar]
- Aliyu, Y.; Sarlan, A.; Danyaro, K.U.; Rahman, A.S. Comparative Analysis of Transformer Models for Sentiment Analysis in Low-Resource Languages. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 353. [Google Scholar] [CrossRef]
- Ramaswamy, S.L.; Chinnappan, J. RecogNet-LSTM+ CNN: A hybrid network with attention mechanism for aspect categorization and sentiment classification. J. Intell. Inf. Syst. 2022, 58, 379–404. [Google Scholar] [CrossRef]
- Ramirez-Alcocer, U.M.; Tello-Leal, E.; Hernandez-Resendiz, J.D.; Romero, G. A Hybrid CNN-LSTM Approach for Sentiment Analysis. In Proceedings of the Congress on Intelligent Systems, Bengaluru, India, 4–5 September 2023; Springer: Singapore, 2023; pp. 425–437. [Google Scholar]
- Saleh, S.N. Enhancing multilabel classification for unbalanced COVID-19 vaccination hesitancy tweets using ensemble learning. Comput. Biol. Med. 2025, 184, 109437. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Wang, G.; Kong, H. Emotion Recognition Based on Dynamic Ensemble Feature Selection. In Man-Machine Interactions; Springer: Berlin/Heidelberg, Germany, 2009; pp. 217–225. [Google Scholar]
- Costa, J.; Silva, C.; Antunes, M.; Ribeiro, B. Boosting dynamic ensemble’s performance in twitter. Neural Comput. Appl. 2020, 32, 10655–10667. [Google Scholar] [CrossRef]
- Pan, B.; Hirota, K.; Jia, Z.; Zhao, L.; Jin, X.; Dai, Y. Multimodal emotion recognition based on feature selection and extreme learning machine in video clips. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 1903–1917. [Google Scholar] [CrossRef]
- Patwardhan, N.; Marrone, S.; Sansone, C. Transformers in the real world: A survey on nlp applications. Information 2023, 14, 242. [Google Scholar] [CrossRef]
- Chen, X.; Yin, Y.; Feng, T. Multi-Label Text Classification Based on BERT and Label Attention Mechanism. In Proceedings of the 2023 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC), Dalian, China, 14–16 April 2023; pp. 386–390. [Google Scholar]
- Yuan, L.; Xu, X.; Sun, P.; Yu, H.P.; Wei, Y.Z.; Zhou, J.J. Research of multi-label text classification based on label attention and correlation networks. PLoS ONE 2024, 19, e0311305. [Google Scholar] [CrossRef]
- Zhou, Z.H. Ensemble Methods: Foundations and Algorithms; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
- Rokach, L. Ensemble-based classifiers. Artif. Intell. Rev. 2010, 33, 1–39. [Google Scholar] [CrossRef]
- Lowe, S. RoBERTa-Base Model on GoEmotions. 2022. Available online: https://huggingface.co/SamLowe/roberta-base-go_emotions (accessed on 4 November 2024).
- Vaswani, A. Attention is all you need. In Advances in Neural Information Processing Systems; NeurIPS Foundation; Curran Associates, Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
- Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
- McNamara, D.S.; Graesser, A.C.; McCarthy, P.M.; Cai, Z. Automated Evaluation of Text and Discourse with Coh-Metrix; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- Kuncheva, L.I. Combining Pattern Classifiers: Methods and Algorithms; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
- Ekman, P. Are there basic emotions? Psychol. Rev. 1992, 99, 550–553. [Google Scholar] [CrossRef] [PubMed]
- Ameer, I.; Bölücü, N.; Sidorov, G.; Can, B. Emotion classification in texts over graph neural networks: Semantic representation is better than syntactic. IEEE Access 2023, 11, 56921–56934. [Google Scholar] [CrossRef]
- Ameer, I.; Bölücü, N.; Siddiqui, M.H.F.; Can, B.; Sidorov, G.; Gelbukh, A. Multi-label emotion classification in texts using transfer learning. Expert Syst. Appl. 2023, 213, 118534. [Google Scholar] [CrossRef]
- Demszky, D.; Movshovitz-Attias, D.; Ko, J.; Cowen, A.; Nemade, G.; Ravi, S. GoEmotions: A dataset of fine-grained emotions. arXiv 2020, arXiv:2005.00547. [Google Scholar]
Ekman Mapping | Emotion(s) |
---|---|
Anger | Anger, Annoyance, Disapproval |
Disgust | Disgust |
Fear | Fear, Nervousness |
Joy | Joy, Amusement, Approval, Excitement, Gratitude, Love, Optimism, Relief, Pride, Admiration, Desire, Caring |
Sadness | Sadness, Disappointment, Embarrassment, Grief, Remorse |
Surprise | Surprise, Realization, Confusion, Curiosity |
Text | Emotion | Ekman Emotion |
---|---|---|
This is so bad that I immediately retold it to everyone I know. | Disappointment, Embarrassment | Sadness |
I didn’t read that but so what? | Annoyance, Curiosity, Disapproval | Anger, Surprise |
Happy to be able to help. | Joy | Joy |
Before Preprocessing | After Preprocessing |
---|---|
Nice job building yourself btw | Nice job building yourself by the way |
Lol it’s a bit of both I think. | laugh out loud it’s a bit of both I think. |
omg, poor little bean | oh my god, poor little bean |
Experiment ID | Experiment Description | Accuracy (%) | Micro F1 (%) | Macro F1 (%) | Hamming Loss |
---|---|---|---|---|---|
1 | Baseline Model, without preprocessing | 65.94 | 67.07 | 60.55 | 0.0935 |
2 | Adding Multi-head Attention | 66.14 | 67.23 | 60.37 | 0.0929 |
3 | Applying DES | 73.79 | 75.05 | 69.08 | 0.0703 |
4 | Addition of ASEM | 73.85 | 75.04 | 68.82 | 0.0704 |
5 | With Emoji Conversion | 72.80 | 74.25 | 68.39 | 0.0725 |
6 | Adding TVNR | 73.73 | 75.01 | 68.96 | 0.0704 |
7 | Proposed Model (EmoBERTa-X) | 75.52 | 76.10 | 70.13 | 0.0679 |
Emotion Category | Emotion | Without WBS | With WBS | Improvement |
---|---|---|---|---|
P/R | P/R | P/R | ||
Frequent | Joy | 88.2/82.4 | 89.7/83.9 | +1.5/+1.5 |
Neutral | 57.5/76.8 | 64.1/79.6 | +6.6/+2.8 | |
Sadness | 68.3/70.0 | 71.8/71.1 | +3.5/+1.1 | |
Rare | Anger | 61.5/64.8 | 66.1/68.9 | +4.6/+4.1 |
Disgust | 51.3/56.7 | 58.5/62.5 | +7.2/+5.8 | |
Fear | 75.9/63.2 | 81.0/65.3 | +5.1/+2.1 | |
Surprise | 72.3/57.0 | 77.0/59.3 | +4.7/+2.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Labib, F.H.; Elagamy, M.; Saleh, S.N. EmoBERTa-X: Advanced Emotion Classifier with Multi-Head Attention and DES for Multilabel Emotion Classification. Big Data Cogn. Comput. 2025, 9, 48. https://doi.org/10.3390/bdcc9020048
Labib FH, Elagamy M, Saleh SN. EmoBERTa-X: Advanced Emotion Classifier with Multi-Head Attention and DES for Multilabel Emotion Classification. Big Data and Cognitive Computing. 2025; 9(2):48. https://doi.org/10.3390/bdcc9020048
Chicago/Turabian StyleLabib, Farah Hassan, Mazen Elagamy, and Sherine Nagy Saleh. 2025. "EmoBERTa-X: Advanced Emotion Classifier with Multi-Head Attention and DES for Multilabel Emotion Classification" Big Data and Cognitive Computing 9, no. 2: 48. https://doi.org/10.3390/bdcc9020048
APA StyleLabib, F. H., Elagamy, M., & Saleh, S. N. (2025). EmoBERTa-X: Advanced Emotion Classifier with Multi-Head Attention and DES for Multilabel Emotion Classification. Big Data and Cognitive Computing, 9(2), 48. https://doi.org/10.3390/bdcc9020048