Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (32)

Search Parameters:
Keywords = multimodal sentiment classification

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 2498 KB  
Article
SceEmoNet: A Sentiment Analysis Model with Scene Construction Capability
by Yi Liang, Dongfang Han, Zhenzhen He, Bo Kong and Shuanglin Wen
Appl. Sci. 2025, 15(15), 8588; https://doi.org/10.3390/app15158588 - 2 Aug 2025
Viewed by 503
Abstract
How do humans analyze the sentiments embedded in text? When attempting to analyze a text, humans construct a “scene” in their minds through imagination based on the text, generating a vague image. They then synthesize the text and the mental image to derive [...] Read more.
How do humans analyze the sentiments embedded in text? When attempting to analyze a text, humans construct a “scene” in their minds through imagination based on the text, generating a vague image. They then synthesize the text and the mental image to derive the final analysis result. However, current sentiment analysis models lack such imagination; they can only analyze based on existing information in the text, which limits their classification accuracy. To address this issue, we propose the SceEmoNet model. This model endows text classification models with imagination through Stable diffusion, enabling the model to generate corresponding visual scenes from input text, thus introducing a new modality of visual information. We then use the Contrastive Language-Image Pre-training (CLIP) model, a multimodal feature extraction model, to extract aligned features from different modalities, preventing significant feature differences caused by data heterogeneity. Finally, we fuse information from different modalities using late fusion to obtain the final classification result. Experiments on six datasets with different classification tasks show improvements of 9.57%, 3.87%, 3.63%, 3.14%, 0.77%, and 0.28%, respectively. Additionally, we set up experiments to deeply analyze the model’s advantages and limitations, providing a new technical path for follow-up research. Full article
(This article belongs to the Special Issue Advanced Technologies and Applications of Emotion Recognition)
Show Figures

Figure 1

16 pages, 6601 KB  
Article
Dynamic Tuning and Multi-Task Learning-Based Model for Multimodal Sentiment Analysis
by Yi Liang, Turdi Tohti, Wenpeng Hu, Bo Kong, Dongfang Han, Tianwei Yan and Askar Hamdulla
Appl. Sci. 2025, 15(11), 6342; https://doi.org/10.3390/app15116342 - 5 Jun 2025
Viewed by 795
Abstract
Multimodal sentiment analysis aims to uncover human affective states by integrating data from multiple sensory sources. However, previous studies have focused on optimizing model architecture, neglecting the impact of objective function settings on model performance. Given this, this study introduces a new framework, [...] Read more.
Multimodal sentiment analysis aims to uncover human affective states by integrating data from multiple sensory sources. However, previous studies have focused on optimizing model architecture, neglecting the impact of objective function settings on model performance. Given this, this study introduces a new framework, DMMSA, which utilizes the intrinsic correlation of sentiment signals and enhances the model’s understanding of complex sentiments. DMMSA incorporates coarse-grained sentiment analysis to reduce task complexity. Meanwhile, it embeds a contrastive learning mechanism within the modality, which decomposes unimodal features into similar and dissimilar ones, thus allowing for the simultaneous consideration of both unimodal and multimodal emotions. We tested DMMSA on the CH-SIMS, MOSI, and MOEI datasets. When only changing the optimization objectives, DMMSA achieved accuracy gains of 3.2%, 1.57%, and 1.95% over the baseline in five-class and seven-class classification tasks. In regression tasks, DMMSA reduced the Mean Absolute Error (MAE) by 1.46%, 1.5%, and 2.8% compared to the baseline. Full article
Show Figures

Figure 1

27 pages, 8770 KB  
Article
Evaluation of Rural Visual Landscape Quality Based on Multi-Source Affective Computing
by Xinyu Zhao, Lin Lin, Xiao Guo, Zhisheng Wang and Ruixuan Li
Appl. Sci. 2025, 15(9), 4905; https://doi.org/10.3390/app15094905 - 28 Apr 2025
Viewed by 767
Abstract
Assessing the visual quality of rural landscapes is pivotal for quantifying ecological services and preserving cultural heritage; however, conventional ecological indicators neglect emotional and cognitive dimensions. To address this gap, the present study proposes a novel visual quality assessment method for rural landscapes [...] Read more.
Assessing the visual quality of rural landscapes is pivotal for quantifying ecological services and preserving cultural heritage; however, conventional ecological indicators neglect emotional and cognitive dimensions. To address this gap, the present study proposes a novel visual quality assessment method for rural landscapes that integrates multimodal sentiment classification models to strengthen sustainability metrics. Four landscape types were selected from three representative villages in Dalian City, China, and the physiological signals (EEG, EOG) and subjective evaluations (Beauty Assessment and SAM Scales) of students and teachers were recorded. Binary, ternary, and five-category emotion classification models were then developed. Results indicate that the binary and ternary models achieve superior accuracy in emotional valence and arousal, whereas the five-category model performs least effectively. Furthermore, an ensemble learning approach outperforms individual classifiers in both binary and ternary tasks, yielding a 16.54% increase in mean accuracy. Integrating subjective and objective data further enhances ternary classification accuracy by 7.7% compared to existing studies, confirming the value of multi-source features. These findings demonstrate that a multi-source sentiment computing framework can serve as a robust quantitative tool for evaluating emotional quality in rural landscapes and promoting their sustainable development. Full article
Show Figures

Figure 1

20 pages, 1112 KB  
Article
Multimodal Emotion Recognition Method Based on Domain Generalization and Graph Neural Networks
by Jinbao Xie, Yulong Wang, Tianxin Meng, Jianqiao Tai, Yueqian Zheng and Yury I. Varatnitski
Electronics 2025, 14(5), 885; https://doi.org/10.3390/electronics14050885 - 23 Feb 2025
Viewed by 2212
Abstract
In recent years, multimodal sentiment analysis has attracted increasing attention from researchers owing to the rapid development of human–computer interactions. Sentiment analysis is an important task for understanding dialogues. However, with the increase of multimodal data, the processing of individual modality features and [...] Read more.
In recent years, multimodal sentiment analysis has attracted increasing attention from researchers owing to the rapid development of human–computer interactions. Sentiment analysis is an important task for understanding dialogues. However, with the increase of multimodal data, the processing of individual modality features and the methods for multimodal feature fusion have become more significant for research. Existing methods that handle the features of each modality separately are not suitable for subsequent multimodal fusion and often fail to capture sufficient global and local information. Therefore, this study proposes a novel multimodal sentiment analysis method based on domain generalization and graph neural networks. The main characteristic of this method is that it considers the features of each modality as domains. It extracts domain-specific and cross-domain-invariant features, thereby facilitating cross-domain generalization. Generalized features are more suitable for multimodal fusion. Graph neural networks were employed to extract global and local information from the dialogue to capture the emotional changes of the speakers. Specifically, global representations were captured by modeling cross-modal interactions at the dialogue level, whereas local information was typically inferred from temporal information or the emotional changes of the speakers. The method proposed in this study outperformed existing models on the IEMOCAP, CMU-MOSEI, and MELD datasets by 0.97%, 1.09% (for seven-class classification), and 0.65% in terms of weighted F1 score, respectively. This clearly demonstrates that the domain-generalized features proposed in this study are better suited for subsequent multimodal fusion, and that the model developed here is more effective at capturing both global and local information. Full article
(This article belongs to the Special Issue Multimodal Learning and Transfer Learning)
Show Figures

Figure 1

33 pages, 2092 KB  
Article
SentimentFormer: A Transformer-Based Multimodal Fusion Framework for Enhanced Sentiment Analysis of Memes in Under-Resourced Bangla Language
by Fatema Tuj Johora Faria, Laith H. Baniata, Mohammad H. Baniata, Mohannad A. Khair, Ahmed Ibrahim Bani Ata, Chayut Bunterngchit and Sangwoo Kang
Electronics 2025, 14(4), 799; https://doi.org/10.3390/electronics14040799 - 18 Feb 2025
Cited by 1 | Viewed by 2923
Abstract
Social media has increasingly relied on memes as a tool for expressing opinions, making meme sentiment analysis an emerging area of interest for researchers. While much of the research has focused on English-language memes, under-resourced languages, such as Bengali, have received limited attention. [...] Read more.
Social media has increasingly relied on memes as a tool for expressing opinions, making meme sentiment analysis an emerging area of interest for researchers. While much of the research has focused on English-language memes, under-resourced languages, such as Bengali, have received limited attention. Given the surge in social media use, the need for sentiment analysis of memes in these languages has become critical. One of the primary challenges in this field is the lack of benchmark datasets, particularly in languages with fewer resources. To address this, we used the MemoSen dataset, designed for Bengali, which consists of 4368 memes annotated with three sentiment labels: positive, negative, and neutral. MemoSen is divided into training (70%), test (20%), and validation (10%) sets, with an imbalanced class distribution: 1349 memes in the positive class, 2728 in the negative class, and 291 in the neutral class. Our approach leverages advanced deep learning techniques for multimodal sentiment analysis in Bengali, introducing three hybrid approaches. SentimentTextFormer is a text-based, fine-tuned model that utilizes state-of-the-art transformer architectures to accurately extract sentiment-related insights from Bengali text, capturing nuanced linguistic features. SentimentImageFormer is an image-based model that employs cutting-edge transformer-based techniques for precise sentiment classification through visual data. Lastly, SentimentFormer is a hybrid model that seamlessly integrates both text and image modalities using fusion strategies. Early fusion combines textual and visual features at the input level, enabling the model to jointly learn from both modalities. Late fusion merges the outputs of separate text and image models, preserving their individual strengths for the final prediction. Intermediate fusion integrates textual and visual features at intermediate layers, refining their interactions during processing. These fusion strategies combine the strengths of both textual and visual data, enhancing sentiment analysis by exploiting complementary information from multiple sources. The performance of our models was evaluated using various accuracy metrics, with SentimentTextFormer achieving 73.31% accuracy and SentimentImageFormer attaining 64.72%. The hybrid model, SentimentFormer (SwiftFormer with mBERT), employing intermediate fusion, shows a notable improvement in accuracy, achieving 79.04%, outperforming SentimentTextFormer by 5.73% and SentimentImageFormer by 14.32%. Among the fusion strategies, SentimentFormer (SwiftFormer with mBERT) achieved the highest accuracy of 79.04%, highlighting the effectiveness of our fusion technique and the reliability of our multimodal framework in improving sentiment analysis accuracy across diverse modalities. Full article
Show Figures

Figure 1

13 pages, 349 KB  
Article
Multi-Task Supervised Alignment Pre-Training for Few-Shot Multimodal Sentiment Analysis
by Junyang Yang, Jiuxin Cao and Chengge Duan
Appl. Sci. 2025, 15(4), 2095; https://doi.org/10.3390/app15042095 - 17 Feb 2025
Cited by 2 | Viewed by 1075
Abstract
Few-shot multimodal sentiment analysis (FMSA) has garnered substantial attention due to the proliferation of multimedia applications, especially given the frequent difficulty in obtaining large quantities of training samples. Previous works have directly incorporated vision modality into the pre-trained language model (PLM) and then [...] Read more.
Few-shot multimodal sentiment analysis (FMSA) has garnered substantial attention due to the proliferation of multimedia applications, especially given the frequent difficulty in obtaining large quantities of training samples. Previous works have directly incorporated vision modality into the pre-trained language model (PLM) and then leveraged prompt learning, showing effectiveness in few-shot scenarios. However, these methods encounter challenges in aligning the high-level semantics of different modalities due to their inherent heterogeneity, which impacts the performance of sentiment analysis. In this paper, we propose a novel framework called Multi-task Supervised Alignment Pre-training (MSAP) to enhance modality alignment and consequently improve the performance of multimodal sentiment analysis. Our approach uses a multi-task training method—incorporating image classification, image style recognition, and image captioning—to extract modal-shared information and stronger semantics to improve visual representation. We employ task-specific prompts to unify these diverse objectives into a single Masked Language Model (MLM), which serves as the foundation for our Multi-task Supervised Alignment Pre-training (MSAP) framework to enhance the alignment of visual and textual modalities. Extensive experiments on three datasets demonstrate that our method achieves a new state-of-the-art for the FMSA task. Full article
Show Figures

Figure 1

23 pages, 1149 KB  
Article
MGAFN-ISA: Multi-Granularity Attention Fusion Network for Implicit Sentiment Analysis
by Yifan Huo, Ming Liu, Junhong Zheng and Lili He
Electronics 2024, 13(24), 4905; https://doi.org/10.3390/electronics13244905 (registering DOI) - 12 Dec 2024
Viewed by 1172
Abstract
Although significant progress has been made in sentiment analysis tasks based on image–text data, existing methods still have limitations in capturing cross-modal correlations and detailed information. To address these issues, we propose a Multi-Granularity Attention Fusion Network for Implicit Sentiment Analysis (MGAFN-ISA). MGAFN-ISA [...] Read more.
Although significant progress has been made in sentiment analysis tasks based on image–text data, existing methods still have limitations in capturing cross-modal correlations and detailed information. To address these issues, we propose a Multi-Granularity Attention Fusion Network for Implicit Sentiment Analysis (MGAFN-ISA). MGAFN-ISA that leverages neural networks and attention mechanisms to effectively reduce noise interference between different modalities and captures distinct, fine-grained visual and textual features. The model includes two key feature extraction modules: a multi-scale attention fusion-based visual feature extractor and a hierarchical attention mechanism-based textual feature extractor, each designed to extract detailed and discriminative visual and textual representations. Additionally, we introduce an image translator engine to produce accurate and detailed image descriptions, further narrowing the semantic gap between the visual and textual modalities. A bidirectional cross-attention mechanism is also incorporated to utilize correlations between fine-grained local regions across modalities, extracting complementary information from heterogeneous visual and textual data. Finally, we designed an adaptive multimodal classification module that dynamically adjusts the contribution of each modality through an adaptive gating mechanism. Extensive experimental results demonstrate that MGAFN-ISA achieves a significant performance improvement over nine state-of-the-art methods across multiple public datasets, validating the effectiveness and advancement of our proposed approach. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

16 pages, 2272 KB  
Article
Augmenting Multimodal Content Representation with Transformers for Misinformation Detection
by Jenq-Haur Wang, Mehdi Norouzi and Shu Ming Tsai
Big Data Cogn. Comput. 2024, 8(10), 134; https://doi.org/10.3390/bdcc8100134 - 11 Oct 2024
Cited by 3 | Viewed by 2069
Abstract
Information sharing on social media has become a common practice for people around the world. Since it is difficult to check user-generated content on social media, huge amounts of rumors and misinformation are being spread with authentic information. On the one hand, most [...] Read more.
Information sharing on social media has become a common practice for people around the world. Since it is difficult to check user-generated content on social media, huge amounts of rumors and misinformation are being spread with authentic information. On the one hand, most of the social platforms identify rumors through manual fact-checking, which is very inefficient. On the other hand, with an emerging form of misinformation that contains inconsistent image–text pairs, it would be beneficial if we could compare the meaning of multimodal content within the same post for detecting image–text inconsistency. In this paper, we propose a novel approach to misinformation detection by multimodal feature fusion with transformers and credibility assessment with self-attention-based Bi-RNN networks. Firstly, captions are derived from images using an image captioning module to obtain their semantic descriptions. These are compared with surrounding text by fine-tuning transformers for consistency check in semantics. Then, to further aggregate sentiment features into text representation, we fine-tune a separate transformer for text sentiment classification, where the output is concatenated to augment text embeddings. Finally, Multi-Cell Bi-GRUs with self-attention are used to train the credibility assessment model for misinformation detection. From the experimental results on tweets, the best performance with an accuracy of 0.904 and an F1-score of 0.921 can be obtained when applying feature fusion of augmented embeddings with sentiment classification results. This shows the potential of the innovative way of applying transformers in our proposed approach to misinformation detection. Further investigation is needed to validate the performance on various types of multimodal discrepancies. Full article
(This article belongs to the Special Issue Sustainable Big Data Analytics and Machine Learning Technologies)
Show Figures

Figure 1

25 pages, 2680 KB  
Systematic Review
A Systematic Literature Review of Modalities, Trends, and Limitations in Emotion Recognition, Affective Computing, and Sentiment Analysis
by Rosa A. García-Hernández, Huizilopoztli Luna-García, José M. Celaya-Padilla, Alejandra García-Hernández, Luis C. Reveles-Gómez, Luis Alberto Flores-Chaires, J. Ruben Delgado-Contreras, David Rondon and Klinge O. Villalba-Condori
Appl. Sci. 2024, 14(16), 7165; https://doi.org/10.3390/app14167165 - 15 Aug 2024
Cited by 11 | Viewed by 13190
Abstract
This systematic literature review delves into the extensive landscape of emotion recognition, sentiment analysis, and affective computing, analyzing 609 articles. Exploring the intricate relationships among these research domains, and leveraging data from four well-established sources—IEEE, Science Direct, Springer, and MDPI—this systematic review classifies [...] Read more.
This systematic literature review delves into the extensive landscape of emotion recognition, sentiment analysis, and affective computing, analyzing 609 articles. Exploring the intricate relationships among these research domains, and leveraging data from four well-established sources—IEEE, Science Direct, Springer, and MDPI—this systematic review classifies studies in four modalities based on the types of data analyzed. These modalities are unimodal, multi-physical, multi-physiological, and multi-physical–physiological. After the classification, key insights about applications, learning models, and data sources are extracted and analyzed. This review highlights the exponential growth in studies utilizing EEG signals for emotion recognition, and the potential of multimodal approaches combining physical and physiological signals to enhance the accuracy and practicality of emotion recognition systems. This comprehensive overview of research advances, emerging trends, and limitations from 2018 to 2023 underscores the importance of continued exploration and interdisciplinary collaboration in these rapidly evolving fields. Full article
(This article belongs to the Special Issue Application of Affective Computing)
Show Figures

Figure 1

29 pages, 6331 KB  
Article
Multimodal Affective Communication Analysis: Fusing Speech Emotion and Text Sentiment Using Machine Learning
by Diego Resende Faria, Abraham Itzhak Weinberg and Pedro Paulo Ayrosa
Appl. Sci. 2024, 14(15), 6631; https://doi.org/10.3390/app14156631 - 29 Jul 2024
Cited by 9 | Viewed by 4038
Abstract
Affective communication, encompassing verbal and non-verbal cues, is crucial for understanding human interactions. This study introduces a novel framework for enhancing emotional understanding by fusing speech emotion recognition (SER) and sentiment analysis (SA). We leverage diverse features and both classical and deep learning [...] Read more.
Affective communication, encompassing verbal and non-verbal cues, is crucial for understanding human interactions. This study introduces a novel framework for enhancing emotional understanding by fusing speech emotion recognition (SER) and sentiment analysis (SA). We leverage diverse features and both classical and deep learning models, including Gaussian naive Bayes (GNB), support vector machines (SVMs), random forests (RFs), multilayer perceptron (MLP), and a 1D convolutional neural network (1D-CNN), to accurately discern and categorize emotions in speech. We further extract text sentiment from speech-to-text conversion, analyzing it using pre-trained models like bidirectional encoder representations from transformers (BERT), generative pre-trained transformer 2 (GPT-2), and logistic regression (LR). To improve individual model performance for both SER and SA, we employ an extended dynamic Bayesian mixture model (DBMM) ensemble classifier. Our most significant contribution is the development of a novel two-layered DBMM (2L-DBMM) for multimodal fusion. This model effectively integrates speech emotion and text sentiment, enabling the classification of more nuanced, second-level emotional states. Evaluating our framework on the EmoUERJ (Portuguese) and ESD (English) datasets, the extended DBMM achieves accuracy rates of 96% and 98% for SER, 85% and 95% for SA, and 96% and 98% for combined emotion classification using the 2L-DBMM, respectively. Our findings demonstrate the superior performance of the extended DBMM for individual modalities compared to individual classifiers and the 2L-DBMM for merging different modalities, highlighting the value of ensemble methods and multimodal fusion in affective communication analysis. The results underscore the potential of our approach in enhancing emotional understanding with broad applications in fields like mental health assessment, human–robot interaction, and cross-cultural communication. Full article
Show Figures

Figure 1

16 pages, 4140 KB  
Article
MFSC: A Multimodal Aspect-Level Sentiment Classification Framework with Multi-Image Gate and Fusion Networks
by Lingling Zi, Xiangkai Pan and Xin Cong
Electronics 2024, 13(12), 2349; https://doi.org/10.3390/electronics13122349 - 15 Jun 2024
Viewed by 1314
Abstract
Currently, there is a great deal of interest in multimodal aspect-level sentiment classification using both textual and visual information, which changes the traditional use of only single-modal to identify sentiment polarity. Considering that existing methods could be strengthened in terms of classification accuracy, [...] Read more.
Currently, there is a great deal of interest in multimodal aspect-level sentiment classification using both textual and visual information, which changes the traditional use of only single-modal to identify sentiment polarity. Considering that existing methods could be strengthened in terms of classification accuracy, we conducted a study on aspect-level multimodal sentiment classification with the aim of exploring the interaction between textual and visual features. Specifically, we construct a multimodal aspect-level sentiment classification framework with multi-image gate and fusion networks called MFSC. MFSC consists of four parts, i.e., text feature extraction, visual feature extraction, text feature enhancement, and multi-feature fusion. Firstly, a bidirectional long short-term memory network is adopted to extract the initial text feature. Based on this, a text feature enhancement strategy is designed, which uses text memory network and adaptive weights to extract the final text features. Meanwhile, a multi-image gate method is proposed for fusing features from multiple images and filtering out irrelevant noise. Finally, a text-visual feature fusion method based on an attention mechanism is proposed to better improve the classification performance by capturing the association between text and images. Experimental results show that MFSC has advantages in classification accuracy and macro-F1. Full article
Show Figures

Figure 1

23 pages, 18654 KB  
Article
A Multimodal Sentiment Analysis Approach Based on a Joint Chained Interactive Attention Mechanism
by Keyuan Qiu, Yingjie Zhang, Jiaxu Zhao, Shun Zhang, Qian Wang and Feng Chen
Electronics 2024, 13(10), 1922; https://doi.org/10.3390/electronics13101922 - 14 May 2024
Cited by 13 | Viewed by 3590
Abstract
The objective of multimodal sentiment analysis is to extract and integrate feature information from text, image, and audio data accurately, in order to identify the emotional state of the speaker. While multimodal fusion schemes have made some progress in this research field, previous [...] Read more.
The objective of multimodal sentiment analysis is to extract and integrate feature information from text, image, and audio data accurately, in order to identify the emotional state of the speaker. While multimodal fusion schemes have made some progress in this research field, previous studies still lack adequate approaches for handling inter-modal information consistency and the fusion of different categorical features within a single modality. This study aims to effectively extract sentiment coherence information among video, audio, and text and consequently proposes a multimodal sentiment analysis method named joint chain interactive attention (VAE-JCIA, Video Audio Essay–Joint Chain Interactive Attention). In this approach, a 3D CNN is employed for extracting facial features from video, a Conformer is employed for extracting audio features, and a Funnel-Transformer is employed for extracting text features. Furthermore, the joint attention mechanism is utilized to identify key regions where sentiment information remains consistent across video, audio, and text. This process acquires reinforcing features that encapsulate information regarding consistency among the other two modalities. Inter-modal feature interactions are addressed through chained interactive attention, and multimodal feature fusion is employed to efficiently perform emotion classification. The method is experimentally validated on the CMU-MOSEI dataset and the IEMOCAP dataset. The experimental results demonstrate that the proposed method significantly enhances the performance of the multimodal sentiment analysis model. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

22 pages, 3031 KB  
Article
Research on Online Review Information Classification Based on Multimodal Deep Learning
by Jingnan Liu, Yefang Sun, Yueyi Zhang and Chenyuan Lu
Appl. Sci. 2024, 14(9), 3801; https://doi.org/10.3390/app14093801 - 29 Apr 2024
Cited by 1 | Viewed by 1748
Abstract
The incessant evolution of online platforms has ushered in a multitude of shopping modalities. Within the food industry, however, assessing the delectability of meals can only be tentatively determined based on consumer feedback encompassing aspects such as taste, pricing, packaging, service quality, delivery [...] Read more.
The incessant evolution of online platforms has ushered in a multitude of shopping modalities. Within the food industry, however, assessing the delectability of meals can only be tentatively determined based on consumer feedback encompassing aspects such as taste, pricing, packaging, service quality, delivery timeliness, hygiene standards, and environmental considerations. Traditional text data mining techniques primarily focus on consumers’ emotional traits, disregarding pertinent information pertaining to the online products themselves. In light of these aforementioned issues in current research methodologies, this paper introduces the Bert BiGRU Softmax model combined with multimodal features to enhance the efficacy of sentiment classification in data analysis. Comparative experiments conducted using existing data demonstrate that the accuracy rate of the model employed in this study reaches 90.9%. In comparison to single models or combinations of three models with the highest accuracy rate of 7.7%, the proposed model exhibits superior accuracy and proves to be highly applicable to online reviews. Full article
Show Figures

Figure 1

13 pages, 4445 KB  
Article
Audio-Based Emotion Recognition Using Self-Supervised Learning on an Engineered Feature Space
by Peranut Nimitsurachat and Peter Washington
AI 2024, 5(1), 195-207; https://doi.org/10.3390/ai5010011 - 17 Jan 2024
Cited by 3 | Viewed by 4574
Abstract
Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio data is rich, a major barrier to achieve consistently high-performance [...] Read more.
Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio data is rich, a major barrier to achieve consistently high-performance models is the paucity of available training labels. Self-supervised learning (SSL) is a family of methods which can learn despite a scarcity of supervised labels by predicting properties of the data itself. To understand the utility of self-supervised learning for audio-based emotion recognition, we have applied self-supervised learning pre-training to the classification of emotions from the CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU- MOSEI)’s acoustic data. Unlike prior papers that have experimented with raw acoustic data, our technique has been applied to encoded acoustic data with 74 parameters of distinctive audio features at discrete timesteps. Our model is first pre-trained to uncover the randomly masked timestamps of the acoustic data. The pre-trained model is then fine-tuned using a small sample of annotated data. The performance of the final model is then evaluated via overall mean absolute error (MAE), mean absolute error (MAE) per emotion, overall four-class accuracy, and four-class accuracy per emotion. These metrics are compared against a baseline deep learning model with an identical backbone architecture. We find that self-supervised learning consistently improves the performance of the model across all metrics, especially when the number of annotated data points in the fine-tuning step is small. Furthermore, we quantify the behaviors of the self-supervised model and its convergence as the amount of annotated data increases. This work characterizes the utility of self-supervised learning for affective computing, demonstrating that self-supervised learning is most useful when the number of training examples is small and that the effect is most pronounced for emotions which are easier to classify such as happy, sad, and angry. This work further demonstrates that self-supervised learning still improves performance when applied to the embedded feature representations rather than the traditional approach of pre-training on the raw input space. Full article
Show Figures

Figure 1

29 pages, 559 KB  
Review
A Brief Survey of Machine Learning and Deep Learning Techniques for E-Commerce Research
by Xue Zhang, Fusen Guo, Tao Chen, Lei Pan, Gleb Beliakov and Jianzhang Wu
J. Theor. Appl. Electron. Commer. Res. 2023, 18(4), 2188-2216; https://doi.org/10.3390/jtaer18040110 - 4 Dec 2023
Cited by 52 | Viewed by 15207
Abstract
The rapid growth of e-commerce has significantly increased the demand for advanced techniques to address specific tasks in the e-commerce field. In this paper, we present a brief survey of machine learning and deep learning techniques in the context of e-commerce, focusing on [...] Read more.
The rapid growth of e-commerce has significantly increased the demand for advanced techniques to address specific tasks in the e-commerce field. In this paper, we present a brief survey of machine learning and deep learning techniques in the context of e-commerce, focusing on the years 2018–2023 in a Google Scholar search, with the aim of identifying state-of-the-art approaches, main topics, and potential challenges in the field. We first introduce the applied machine learning and deep learning techniques, spanning from support vector machines, decision trees, and random forests to conventional neural networks, recurrent neural networks, generative adversarial networks, and beyond. Next, we summarize the main topics, including sentiment analysis, recommendation systems, fake review detection, fraud detection, customer churn prediction, customer purchase behavior prediction, prediction of sales, product classification, and image recognition. Finally, we discuss the main challenges and trends, which are related to imbalanced data, over-fitting and generalization, multi-modal learning, interpretability, personalization, chatbots, and virtual assistance. This survey offers a concise overview of the current state and future directions regarding the use of machine learning and deep learning techniques in the context of e-commerce. Further research and development will be necessary to address the evolving challenges and opportunities presented by the dynamic e-commerce landscape. Full article
(This article belongs to the Section e-Commerce Analytics)
Show Figures

Figure 1

Back to TopTop