Next Article in Journal
Exploring Factors Associated with Changes in Pain and Function Following mHealth-Based Exercise Therapy for Chronic Musculoskeletal Pain: A Systematic Review with Meta-Analysis and Meta-Regression
Previous Article in Journal
The Potential of Fiber-Reinforced Concrete to Reduce the Environmental Impact of Concrete Construction
Previous Article in Special Issue
The Effectiveness of UWB-Based Indoor Positioning Systems for the Navigation of Visually Impaired Individuals
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Multimodal Affective Communication Analysis: Fusing Speech Emotion and Text Sentiment Using Machine Learning

by
Diego Resende Faria
1,*,
Abraham Itzhak Weinberg
2 and
Pedro Paulo Ayrosa
3
1
School of Physics, Engineering and Computer Science, University of Hertfordshire, Hertfordshire AL10 9AB, UK
2
AI-Weinberg AI Experts, Tel Aviv 90850, Israel
3
LABTED and Computer Science Department, State University of Londrina, Londrina 86057-970, Brazil
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(15), 6631; https://doi.org/10.3390/app14156631 (registering DOI)
Submission received: 3 July 2024 / Revised: 24 July 2024 / Accepted: 26 July 2024 / Published: 29 July 2024

Abstract

Affective communication, encompassing verbal and non-verbal cues, is crucial for understanding human interactions. This study introduces a novel framework for enhancing emotional understanding by fusing speech emotion recognition (SER) and sentiment analysis (SA). We leverage diverse features and both classical and deep learning models, including Gaussian naive Bayes (GNB), support vector machines (SVMs), random forests (RFs), multilayer perceptron (MLP), and a 1D convolutional neural network (1D-CNN), to accurately discern and categorize emotions in speech. We further extract text sentiment from speech-to-text conversion, analyzing it using pre-trained models like bidirectional encoder representations from transformers (BERT), generative pre-trained transformer 2 (GPT-2), and logistic regression (LR). To improve individual model performance for both SER and SA, we employ an extended dynamic Bayesian mixture model (DBMM) ensemble classifier. Our most significant contribution is the development of a novel two-layered DBMM (2L-DBMM) for multimodal fusion. This model effectively integrates speech emotion and text sentiment, enabling the classification of more nuanced, second-level emotional states. Evaluating our framework on the EmoUERJ (Portuguese) and ESD (English) datasets, the extended DBMM achieves accuracy rates of 96% and 98% for SER, 85% and 95% for SA, and 96% and 98% for combined emotion classification using the 2L-DBMM, respectively. Our findings demonstrate the superior performance of the extended DBMM for individual modalities compared to individual classifiers and the 2L-DBMM for merging different modalities, highlighting the value of ensemble methods and multimodal fusion in affective communication analysis. The results underscore the potential of our approach in enhancing emotional understanding with broad applications in fields like mental health assessment, human–robot interaction, and cross-cultural communication.
Keywords: speech emotion recognition; sentiment analysis; affective communication; data fusion; multimodality; machine learning; deep learning; dynamic Bayesian mixture model speech emotion recognition; sentiment analysis; affective communication; data fusion; multimodality; machine learning; deep learning; dynamic Bayesian mixture model

Share and Cite

MDPI and ACS Style

Resende Faria, D.; Weinberg, A.I.; Ayrosa, P.P. Multimodal Affective Communication Analysis: Fusing Speech Emotion and Text Sentiment Using Machine Learning. Appl. Sci. 2024, 14, 6631. https://doi.org/10.3390/app14156631

AMA Style

Resende Faria D, Weinberg AI, Ayrosa PP. Multimodal Affective Communication Analysis: Fusing Speech Emotion and Text Sentiment Using Machine Learning. Applied Sciences. 2024; 14(15):6631. https://doi.org/10.3390/app14156631

Chicago/Turabian Style

Resende Faria, Diego, Abraham Itzhak Weinberg, and Pedro Paulo Ayrosa. 2024. "Multimodal Affective Communication Analysis: Fusing Speech Emotion and Text Sentiment Using Machine Learning" Applied Sciences 14, no. 15: 6631. https://doi.org/10.3390/app14156631

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop