MDPI - Publisher of Open Access Journals

22 pages, 2498 KB

Open AccessArticle

SceEmoNet: A Sentiment Analysis Model with Scene Construction Capability

by Yi Liang, Dongfang Han, Zhenzhen He, Bo Kong and Shuanglin Wen

Appl. Sci. 2025, 15(15), 8588; https://doi.org/10.3390/app15158588 - 2 Aug 2025

Viewed by 503

How do humans analyze the sentiments embedded in text? When attempting to analyze a text, humans construct a “scene” in their minds through imagination based on the text, generating a vague image. They then synthesize the text and the mental image to derive [...] Read more.

How do humans analyze the sentiments embedded in text? When attempting to analyze a text, humans construct a “scene” in their minds through imagination based on the text, generating a vague image. They then synthesize the text and the mental image to derive the final analysis result. However, current sentiment analysis models lack such imagination; they can only analyze based on existing information in the text, which limits their classification accuracy. To address this issue, we propose the SceEmoNet model. This model endows text classification models with imagination through Stable diffusion, enabling the model to generate corresponding visual scenes from input text, thus introducing a new modality of visual information. We then use the Contrastive Language-Image Pre-training (CLIP) model, a multimodal feature extraction model, to extract aligned features from different modalities, preventing significant feature differences caused by data heterogeneity. Finally, we fuse information from different modalities using late fusion to obtain the final classification result. Experiments on six datasets with different classification tasks show improvements of 9.57%, 3.87%, 3.63%, 3.14%, 0.77%, and 0.28%, respectively. Additionally, we set up experiments to deeply analyze the model’s advantages and limitations, providing a new technical path for follow-up research. Full article

(This article belongs to the Special Issue Advanced Technologies and Applications of Emotion Recognition)

► Show Figures

Figure 1

16 pages, 6601 KB

Open AccessArticle

Dynamic Tuning and Multi-Task Learning-Based Model for Multimodal Sentiment Analysis

by Yi Liang, Turdi Tohti, Wenpeng Hu, Bo Kong, Dongfang Han, Tianwei Yan and Askar Hamdulla

Appl. Sci. 2025, 15(11), 6342; https://doi.org/10.3390/app15116342 - 5 Jun 2025

Viewed by 795

Abstract

Multimodal sentiment analysis aims to uncover human affective states by integrating data from multiple sensory sources. However, previous studies have focused on optimizing model architecture, neglecting the impact of objective function settings on model performance. Given this, this study introduces a new framework, [...] Read more.

Multimodal sentiment analysis aims to uncover human affective states by integrating data from multiple sensory sources. However, previous studies have focused on optimizing model architecture, neglecting the impact of objective function settings on model performance. Given this, this study introduces a new framework, DMMSA, which utilizes the intrinsic correlation of sentiment signals and enhances the model’s understanding of complex sentiments. DMMSA incorporates coarse-grained sentiment analysis to reduce task complexity. Meanwhile, it embeds a contrastive learning mechanism within the modality, which decomposes unimodal features into similar and dissimilar ones, thus allowing for the simultaneous consideration of both unimodal and multimodal emotions. We tested DMMSA on the CH-SIMS, MOSI, and MOEI datasets. When only changing the optimization objectives, DMMSA achieved accuracy gains of 3.2%, 1.57%, and 1.95% over the baseline in five-class and seven-class classification tasks. In regression tasks, DMMSA reduced the Mean Absolute Error (MAE) by 1.46%, 1.5%, and 2.8% compared to the baseline. Full article

(This article belongs to the Special Issue Advancements in Natural Language Processing, Semantic Networks, and Sentiment Analysis: 2nd Edition)

► Show Figures

Figure 1

27 pages, 8770 KB

Open AccessArticle

Evaluation of Rural Visual Landscape Quality Based on Multi-Source Affective Computing

by Xinyu Zhao, Lin Lin, Xiao Guo, Zhisheng Wang and Ruixuan Li

Appl. Sci. 2025, 15(9), 4905; https://doi.org/10.3390/app15094905 - 28 Apr 2025

Viewed by 767

Abstract

Assessing the visual quality of rural landscapes is pivotal for quantifying ecological services and preserving cultural heritage; however, conventional ecological indicators neglect emotional and cognitive dimensions. To address this gap, the present study proposes a novel visual quality assessment method for rural landscapes [...] Read more.

Assessing the visual quality of rural landscapes is pivotal for quantifying ecological services and preserving cultural heritage; however, conventional ecological indicators neglect emotional and cognitive dimensions. To address this gap, the present study proposes a novel visual quality assessment method for rural landscapes that integrates multimodal sentiment classification models to strengthen sustainability metrics. Four landscape types were selected from three representative villages in Dalian City, China, and the physiological signals (EEG, EOG) and subjective evaluations (Beauty Assessment and SAM Scales) of students and teachers were recorded. Binary, ternary, and five-category emotion classification models were then developed. Results indicate that the binary and ternary models achieve superior accuracy in emotional valence and arousal, whereas the five-category model performs least effectively. Furthermore, an ensemble learning approach outperforms individual classifiers in both binary and ternary tasks, yielding a 16.54% increase in mean accuracy. Integrating subjective and objective data further enhances ternary classification accuracy by 7.7% compared to existing studies, confirming the value of multi-source features. These findings demonstrate that a multi-source sentiment computing framework can serve as a robust quantitative tool for evaluating emotional quality in rural landscapes and promoting their sustainable development. Full article

► Show Figures

Figure 1

20 pages, 1112 KB

Open AccessArticle

Multimodal Emotion Recognition Method Based on Domain Generalization and Graph Neural Networks

by Jinbao Xie, Yulong Wang, Tianxin Meng, Jianqiao Tai, Yueqian Zheng and Yury I. Varatnitski

Electronics 2025, 14(5), 885; https://doi.org/10.3390/electronics14050885 - 23 Feb 2025

Viewed by 2212

Abstract

In recent years, multimodal sentiment analysis has attracted increasing attention from researchers owing to the rapid development of human–computer interactions. Sentiment analysis is an important task for understanding dialogues. However, with the increase of multimodal data, the processing of individual modality features and [...] Read more.

In recent years, multimodal sentiment analysis has attracted increasing attention from researchers owing to the rapid development of human–computer interactions. Sentiment analysis is an important task for understanding dialogues. However, with the increase of multimodal data, the processing of individual modality features and the methods for multimodal feature fusion have become more significant for research. Existing methods that handle the features of each modality separately are not suitable for subsequent multimodal fusion and often fail to capture sufficient global and local information. Therefore, this study proposes a novel multimodal sentiment analysis method based on domain generalization and graph neural networks. The main characteristic of this method is that it considers the features of each modality as domains. It extracts domain-specific and cross-domain-invariant features, thereby facilitating cross-domain generalization. Generalized features are more suitable for multimodal fusion. Graph neural networks were employed to extract global and local information from the dialogue to capture the emotional changes of the speakers. Specifically, global representations were captured by modeling cross-modal interactions at the dialogue level, whereas local information was typically inferred from temporal information or the emotional changes of the speakers. The method proposed in this study outperformed existing models on the IEMOCAP, CMU-MOSEI, and MELD datasets by 0.97%, 1.09% (for seven-class classification), and 0.65% in terms of weighted F1 score, respectively. This clearly demonstrates that the domain-generalized features proposed in this study are better suited for subsequent multimodal fusion, and that the model developed here is more effective at capturing both global and local information. Full article

(This article belongs to the Special Issue Multimodal Learning and Transfer Learning)

► Show Figures

Figure 1

33 pages, 2092 KB

Open AccessArticle

SentimentFormer: A Transformer-Based Multimodal Fusion Framework for Enhanced Sentiment Analysis of Memes in Under-Resourced Bangla Language

by Fatema Tuj Johora Faria, Laith H. Baniata, Mohammad H. Baniata, Mohannad A. Khair, Ahmed Ibrahim Bani Ata, Chayut Bunterngchit and Sangwoo Kang

Electronics 2025, 14(4), 799; https://doi.org/10.3390/electronics14040799 - 18 Feb 2025

Cited by 1 | Viewed by 2923

Abstract

Social media has increasingly relied on memes as a tool for expressing opinions, making meme sentiment analysis an emerging area of interest for researchers. While much of the research has focused on English-language memes, under-resourced languages, such as Bengali, have received limited attention. [...] Read more.

Social media has increasingly relied on memes as a tool for expressing opinions, making meme sentiment analysis an emerging area of interest for researchers. While much of the research has focused on English-language memes, under-resourced languages, such as Bengali, have received limited attention. Given the surge in social media use, the need for sentiment analysis of memes in these languages has become critical. One of the primary challenges in this field is the lack of benchmark datasets, particularly in languages with fewer resources. To address this, we used the MemoSen dataset, designed for Bengali, which consists of 4368 memes annotated with three sentiment labels: positive, negative, and neutral. MemoSen is divided into training (70%), test (20%), and validation (10%) sets, with an imbalanced class distribution: 1349 memes in the positive class, 2728 in the negative class, and 291 in the neutral class. Our approach leverages advanced deep learning techniques for multimodal sentiment analysis in Bengali, introducing three hybrid approaches. SentimentTextFormer is a text-based, fine-tuned model that utilizes state-of-the-art transformer architectures to accurately extract sentiment-related insights from Bengali text, capturing nuanced linguistic features. SentimentImageFormer is an image-based model that employs cutting-edge transformer-based techniques for precise sentiment classification through visual data. Lastly, SentimentFormer is a hybrid model that seamlessly integrates both text and image modalities using fusion strategies. Early fusion combines textual and visual features at the input level, enabling the model to jointly learn from both modalities. Late fusion merges the outputs of separate text and image models, preserving their individual strengths for the final prediction. Intermediate fusion integrates textual and visual features at intermediate layers, refining their interactions during processing. These fusion strategies combine the strengths of both textual and visual data, enhancing sentiment analysis by exploiting complementary information from multiple sources. The performance of our models was evaluated using various accuracy metrics, with SentimentTextFormer achieving 73.31% accuracy and SentimentImageFormer attaining 64.72%. The hybrid model, SentimentFormer (SwiftFormer with mBERT), employing intermediate fusion, shows a notable improvement in accuracy, achieving 79.04%, outperforming SentimentTextFormer by 5.73% and SentimentImageFormer by 14.32%. Among the fusion strategies, SentimentFormer (SwiftFormer with mBERT) achieved the highest accuracy of 79.04%, highlighting the effectiveness of our fusion technique and the reliability of our multimodal framework in improving sentiment analysis accuracy across diverse modalities. Full article

(This article belongs to the Special Issue Machine Learning Advances and Applications on Natural Language Processing (NLP))

► Show Figures

Figure 1

13 pages, 349 KB

Open AccessArticle

Multi-Task Supervised Alignment Pre-Training for Few-Shot Multimodal Sentiment Analysis

by Junyang Yang, Jiuxin Cao and Chengge Duan

Appl. Sci. 2025, 15(4), 2095; https://doi.org/10.3390/app15042095 - 17 Feb 2025

Cited by 2 | Viewed by 1075

Abstract

Few-shot multimodal sentiment analysis (FMSA) has garnered substantial attention due to the proliferation of multimedia applications, especially given the frequent difficulty in obtaining large quantities of training samples. Previous works have directly incorporated vision modality into the pre-trained language model (PLM) and then [...] Read more.

Few-shot multimodal sentiment analysis (FMSA) has garnered substantial attention due to the proliferation of multimedia applications, especially given the frequent difficulty in obtaining large quantities of training samples. Previous works have directly incorporated vision modality into the pre-trained language model (PLM) and then leveraged prompt learning, showing effectiveness in few-shot scenarios. However, these methods encounter challenges in aligning the high-level semantics of different modalities due to their inherent heterogeneity, which impacts the performance of sentiment analysis. In this paper, we propose a novel framework called Multi-task Supervised Alignment Pre-training (MSAP) to enhance modality alignment and consequently improve the performance of multimodal sentiment analysis. Our approach uses a multi-task training method—incorporating image classification, image style recognition, and image captioning—to extract modal-shared information and stronger semantics to improve visual representation. We employ task-specific prompts to unify these diverse objectives into a single Masked Language Model (MLM), which serves as the foundation for our Multi-task Supervised Alignment Pre-training (MSAP) framework to enhance the alignment of visual and textual modalities. Extensive experiments on three datasets demonstrate that our method achieves a new state-of-the-art for the FMSA task. Full article

(This article belongs to the Special Issue Recent Applications of Machine Learning in Natural Language Processing (NLP))

► Show Figures

Figure 1

23 pages, 1149 KB

Open AccessArticle

MGAFN-ISA: Multi-Granularity Attention Fusion Network for Implicit Sentiment Analysis

by Yifan Huo, Ming Liu, Junhong Zheng and Lili He

Electronics 2024, 13(24), 4905; https://doi.org/10.3390/electronics13244905 (registering DOI) - 12 Dec 2024

Viewed by 1172

Abstract

Although significant progress has been made in sentiment analysis tasks based on image–text data, existing methods still have limitations in capturing cross-modal correlations and detailed information. To address these issues, we propose a Multi-Granularity Attention Fusion Network for Implicit Sentiment Analysis (MGAFN-ISA). MGAFN-ISA [...] Read more.

Although significant progress has been made in sentiment analysis tasks based on image–text data, existing methods still have limitations in capturing cross-modal correlations and detailed information. To address these issues, we propose a Multi-Granularity Attention Fusion Network for Implicit Sentiment Analysis (MGAFN-ISA). MGAFN-ISA that leverages neural networks and attention mechanisms to effectively reduce noise interference between different modalities and captures distinct, fine-grained visual and textual features. The model includes two key feature extraction modules: a multi-scale attention fusion-based visual feature extractor and a hierarchical attention mechanism-based textual feature extractor, each designed to extract detailed and discriminative visual and textual representations. Additionally, we introduce an image translator engine to produce accurate and detailed image descriptions, further narrowing the semantic gap between the visual and textual modalities. A bidirectional cross-attention mechanism is also incorporated to utilize correlations between fine-grained local regions across modalities, extracting complementary information from heterogeneous visual and textual data. Finally, we designed an adaptive multimodal classification module that dynamically adjusts the contribution of each modality through an adaptive gating mechanism. Extensive experimental results demonstrate that MGAFN-ISA achieves a significant performance improvement over nine state-of-the-art methods across multiple public datasets, validating the effectiveness and advancement of our proposed approach. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

16 pages, 2272 KB

Open AccessArticle

Augmenting Multimodal Content Representation with Transformers for Misinformation Detection

by Jenq-Haur Wang, Mehdi Norouzi and Shu Ming Tsai

Big Data Cogn. Comput. 2024, 8(10), 134; https://doi.org/10.3390/bdcc8100134 - 11 Oct 2024

Cited by 3 | Viewed by 2069

Abstract

Information sharing on social media has become a common practice for people around the world. Since it is difficult to check user-generated content on social media, huge amounts of rumors and misinformation are being spread with authentic information. On the one hand, most [...] Read more.

Information sharing on social media has become a common practice for people around the world. Since it is difficult to check user-generated content on social media, huge amounts of rumors and misinformation are being spread with authentic information. On the one hand, most of the social platforms identify rumors through manual fact-checking, which is very inefficient. On the other hand, with an emerging form of misinformation that contains inconsistent image–text pairs, it would be beneficial if we could compare the meaning of multimodal content within the same post for detecting image–text inconsistency. In this paper, we propose a novel approach to misinformation detection by multimodal feature fusion with transformers and credibility assessment with self-attention-based Bi-RNN networks. Firstly, captions are derived from images using an image captioning module to obtain their semantic descriptions. These are compared with surrounding text by fine-tuning transformers for consistency check in semantics. Then, to further aggregate sentiment features into text representation, we fine-tune a separate transformer for text sentiment classification, where the output is concatenated to augment text embeddings. Finally, Multi-Cell Bi-GRUs with self-attention are used to train the credibility assessment model for misinformation detection. From the experimental results on tweets, the best performance with an accuracy of 0.904 and an F1-score of 0.921 can be obtained when applying feature fusion of augmented embeddings with sentiment classification results. This shows the potential of the innovative way of applying transformers in our proposed approach to misinformation detection. Further investigation is needed to validate the performance on various types of multimodal discrepancies. Full article

(This article belongs to the Special Issue Sustainable Big Data Analytics and Machine Learning Technologies)

► Show Figures

Figure 1

25 pages, 2680 KB

Open AccessSystematic Review

A Systematic Literature Review of Modalities, Trends, and Limitations in Emotion Recognition, Affective Computing, and Sentiment Analysis

by Rosa A. García-Hernández, Huizilopoztli Luna-García, José M. Celaya-Padilla, Alejandra García-Hernández, Luis C. Reveles-Gómez, Luis Alberto Flores-Chaires, J. Ruben Delgado-Contreras, David Rondon and Klinge O. Villalba-Condori

Appl. Sci. 2024, 14(16), 7165; https://doi.org/10.3390/app14167165 - 15 Aug 2024

Cited by 11 | Viewed by 13190

Abstract

This systematic literature review delves into the extensive landscape of emotion recognition, sentiment analysis, and affective computing, analyzing 609 articles. Exploring the intricate relationships among these research domains, and leveraging data from four well-established sources—IEEE, Science Direct, Springer, and MDPI—this systematic review classifies [...] Read more.

This systematic literature review delves into the extensive landscape of emotion recognition, sentiment analysis, and affective computing, analyzing 609 articles. Exploring the intricate relationships among these research domains, and leveraging data from four well-established sources—IEEE, Science Direct, Springer, and MDPI—this systematic review classifies studies in four modalities based on the types of data analyzed. These modalities are unimodal, multi-physical, multi-physiological, and multi-physical–physiological. After the classification, key insights about applications, learning models, and data sources are extracted and analyzed. This review highlights the exponential growth in studies utilizing EEG signals for emotion recognition, and the potential of multimodal approaches combining physical and physiological signals to enhance the accuracy and practicality of emotion recognition systems. This comprehensive overview of research advances, emerging trends, and limitations from 2018 to 2023 underscores the importance of continued exploration and interdisciplinary collaboration in these rapidly evolving fields. Full article

(This article belongs to the Special Issue Application of Affective Computing)

► Show Figures

Figure 1

29 pages, 6331 KB

Open AccessArticle

Multimodal Affective Communication Analysis: Fusing Speech Emotion and Text Sentiment Using Machine Learning

by Diego Resende Faria, Abraham Itzhak Weinberg and Pedro Paulo Ayrosa

Appl. Sci. 2024, 14(15), 6631; https://doi.org/10.3390/app14156631 - 29 Jul 2024

Cited by 9 | Viewed by 4038

Abstract

Affective communication, encompassing verbal and non-verbal cues, is crucial for understanding human interactions. This study introduces a novel framework for enhancing emotional understanding by fusing speech emotion recognition (SER) and sentiment analysis (SA). We leverage diverse features and both classical and deep learning [...] Read more.

Affective communication, encompassing verbal and non-verbal cues, is crucial for understanding human interactions. This study introduces a novel framework for enhancing emotional understanding by fusing speech emotion recognition (SER) and sentiment analysis (SA). We leverage diverse features and both classical and deep learning models, including Gaussian naive Bayes (GNB), support vector machines (SVMs), random forests (RFs), multilayer perceptron (MLP), and a 1D convolutional neural network (1D-CNN), to accurately discern and categorize emotions in speech. We further extract text sentiment from speech-to-text conversion, analyzing it using pre-trained models like bidirectional encoder representations from transformers (BERT), generative pre-trained transformer 2 (GPT-2), and logistic regression (LR). To improve individual model performance for both SER and SA, we employ an extended dynamic Bayesian mixture model (DBMM) ensemble classifier. Our most significant contribution is the development of a novel two-layered DBMM (2L-DBMM) for multimodal fusion. This model effectively integrates speech emotion and text sentiment, enabling the classification of more nuanced, second-level emotional states. Evaluating our framework on the EmoUERJ (Portuguese) and ESD (English) datasets, the extended DBMM achieves accuracy rates of 96% and 98% for SER, 85% and 95% for SA, and 96% and 98% for combined emotion classification using the 2L-DBMM, respectively. Our findings demonstrate the superior performance of the extended DBMM for individual modalities compared to individual classifiers and the 2L-DBMM for merging different modalities, highlighting the value of ensemble methods and multimodal fusion in affective communication analysis. The results underscore the potential of our approach in enhancing emotional understanding with broad applications in fields like mental health assessment, human–robot interaction, and cross-cultural communication. Full article

(This article belongs to the Special Issue Future Human-Technology Interactions and Their Intelligent Applications)

► Show Figures

Figure 1

16 pages, 4140 KB

Open AccessArticle

MFSC: A Multimodal Aspect-Level Sentiment Classification Framework with Multi-Image Gate and Fusion Networks

by Lingling Zi, Xiangkai Pan and Xin Cong

Electronics 2024, 13(12), 2349; https://doi.org/10.3390/electronics13122349 - 15 Jun 2024

Viewed by 1314

Abstract

Currently, there is a great deal of interest in multimodal aspect-level sentiment classification using both textual and visual information, which changes the traditional use of only single-modal to identify sentiment polarity. Considering that existing methods could be strengthened in terms of classification accuracy, [...] Read more.

Currently, there is a great deal of interest in multimodal aspect-level sentiment classification using both textual and visual information, which changes the traditional use of only single-modal to identify sentiment polarity. Considering that existing methods could be strengthened in terms of classification accuracy, we conducted a study on aspect-level multimodal sentiment classification with the aim of exploring the interaction between textual and visual features. Specifically, we construct a multimodal aspect-level sentiment classification framework with multi-image gate and fusion networks called MFSC. MFSC consists of four parts, i.e., text feature extraction, visual feature extraction, text feature enhancement, and multi-feature fusion. Firstly, a bidirectional long short-term memory network is adopted to extract the initial text feature. Based on this, a text feature enhancement strategy is designed, which uses text memory network and adaptive weights to extract the final text features. Meanwhile, a multi-image gate method is proposed for fusing features from multiple images and filtering out irrelevant noise. Finally, a text-visual feature fusion method based on an attention mechanism is proposed to better improve the classification performance by capturing the association between text and images. Experimental results show that MFSC has advantages in classification accuracy and macro-F1. Full article

(This article belongs to the Special Issue Multimodal Data Fusion and Computational Optimization for Intelligent Perception)

► Show Figures

Figure 1

23 pages, 18654 KB

Open AccessArticle

A Multimodal Sentiment Analysis Approach Based on a Joint Chained Interactive Attention Mechanism

by Keyuan Qiu, Yingjie Zhang, Jiaxu Zhao, Shun Zhang, Qian Wang and Feng Chen

Electronics 2024, 13(10), 1922; https://doi.org/10.3390/electronics13101922 - 14 May 2024

Cited by 13 | Viewed by 3590

Abstract

The objective of multimodal sentiment analysis is to extract and integrate feature information from text, image, and audio data accurately, in order to identify the emotional state of the speaker. While multimodal fusion schemes have made some progress in this research field, previous [...] Read more.

The objective of multimodal sentiment analysis is to extract and integrate feature information from text, image, and audio data accurately, in order to identify the emotional state of the speaker. While multimodal fusion schemes have made some progress in this research field, previous studies still lack adequate approaches for handling inter-modal information consistency and the fusion of different categorical features within a single modality. This study aims to effectively extract sentiment coherence information among video, audio, and text and consequently proposes a multimodal sentiment analysis method named joint chain interactive attention (VAE-JCIA, Video Audio Essay–Joint Chain Interactive Attention). In this approach, a 3D CNN is employed for extracting facial features from video, a Conformer is employed for extracting audio features, and a Funnel-Transformer is employed for extracting text features. Furthermore, the joint attention mechanism is utilized to identify key regions where sentiment information remains consistent across video, audio, and text. This process acquires reinforcing features that encapsulate information regarding consistency among the other two modalities. Inter-modal feature interactions are addressed through chained interactive attention, and multimodal feature fusion is employed to efficiently perform emotion classification. The method is experimentally validated on the CMU-MOSEI dataset and the IEMOCAP dataset. The experimental results demonstrate that the proposed method significantly enhances the performance of the multimodal sentiment analysis model. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 3031 KB

Open AccessArticle

Research on Online Review Information Classification Based on Multimodal Deep Learning

by Jingnan Liu, Yefang Sun, Yueyi Zhang and Chenyuan Lu

Appl. Sci. 2024, 14(9), 3801; https://doi.org/10.3390/app14093801 - 29 Apr 2024

Cited by 1 | Viewed by 1748

Abstract

The incessant evolution of online platforms has ushered in a multitude of shopping modalities. Within the food industry, however, assessing the delectability of meals can only be tentatively determined based on consumer feedback encompassing aspects such as taste, pricing, packaging, service quality, delivery [...] Read more.

The incessant evolution of online platforms has ushered in a multitude of shopping modalities. Within the food industry, however, assessing the delectability of meals can only be tentatively determined based on consumer feedback encompassing aspects such as taste, pricing, packaging, service quality, delivery timeliness, hygiene standards, and environmental considerations. Traditional text data mining techniques primarily focus on consumers’ emotional traits, disregarding pertinent information pertaining to the online products themselves. In light of these aforementioned issues in current research methodologies, this paper introduces the Bert BiGRU Softmax model combined with multimodal features to enhance the efficacy of sentiment classification in data analysis. Comparative experiments conducted using existing data demonstrate that the accuracy rate of the model employed in this study reaches 90.9%. In comparison to single models or combinations of three models with the highest accuracy rate of 7.7%, the proposed model exhibits superior accuracy and proves to be highly applicable to online reviews. Full article

► Show Figures

Figure 1

13 pages, 4445 KB

Open AccessArticle

Audio-Based Emotion Recognition Using Self-Supervised Learning on an Engineered Feature Space

by Peranut Nimitsurachat and Peter Washington

AI 2024, 5(1), 195-207; https://doi.org/10.3390/ai5010011 - 17 Jan 2024

Cited by 3 | Viewed by 4574

Abstract

Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio data is rich, a major barrier to achieve consistently high-performance [...] Read more.

Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio data is rich, a major barrier to achieve consistently high-performance models is the paucity of available training labels. Self-supervised learning (SSL) is a family of methods which can learn despite a scarcity of supervised labels by predicting properties of the data itself. To understand the utility of self-supervised learning for audio-based emotion recognition, we have applied self-supervised learning pre-training to the classification of emotions from the CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU- MOSEI)’s acoustic data. Unlike prior papers that have experimented with raw acoustic data, our technique has been applied to encoded acoustic data with 74 parameters of distinctive audio features at discrete timesteps. Our model is first pre-trained to uncover the randomly masked timestamps of the acoustic data. The pre-trained model is then fine-tuned using a small sample of annotated data. The performance of the final model is then evaluated via overall mean absolute error (MAE), mean absolute error (MAE) per emotion, overall four-class accuracy, and four-class accuracy per emotion. These metrics are compared against a baseline deep learning model with an identical backbone architecture. We find that self-supervised learning consistently improves the performance of the model across all metrics, especially when the number of annotated data points in the fine-tuning step is small. Furthermore, we quantify the behaviors of the self-supervised model and its convergence as the amount of annotated data increases. This work characterizes the utility of self-supervised learning for affective computing, demonstrating that self-supervised learning is most useful when the number of training examples is small and that the effect is most pronounced for emotions which are easier to classify such as happy, sad, and angry. This work further demonstrates that self-supervised learning still improves performance when applied to the embedded feature representations rather than the traditional approach of pre-training on the raw input space. Full article

► Show Figures

Figure 1

29 pages, 559 KB

Open AccessReview

A Brief Survey of Machine Learning and Deep Learning Techniques for E-Commerce Research

by Xue Zhang, Fusen Guo, Tao Chen, Lei Pan, Gleb Beliakov and Jianzhang Wu

J. Theor. Appl. Electron. Commer. Res. 2023, 18(4), 2188-2216; https://doi.org/10.3390/jtaer18040110 - 4 Dec 2023

Cited by 52 | Viewed by 15207

Abstract

The rapid growth of e-commerce has significantly increased the demand for advanced techniques to address specific tasks in the e-commerce field. In this paper, we present a brief survey of machine learning and deep learning techniques in the context of e-commerce, focusing on [...] Read more.

The rapid growth of e-commerce has significantly increased the demand for advanced techniques to address specific tasks in the e-commerce field. In this paper, we present a brief survey of machine learning and deep learning techniques in the context of e-commerce, focusing on the years 2018–2023 in a Google Scholar search, with the aim of identifying state-of-the-art approaches, main topics, and potential challenges in the field. We first introduce the applied machine learning and deep learning techniques, spanning from support vector machines, decision trees, and random forests to conventional neural networks, recurrent neural networks, generative adversarial networks, and beyond. Next, we summarize the main topics, including sentiment analysis, recommendation systems, fake review detection, fraud detection, customer churn prediction, customer purchase behavior prediction, prediction of sales, product classification, and image recognition. Finally, we discuss the main challenges and trends, which are related to imbalanced data, over-fitting and generalization, multi-modal learning, interpretability, personalization, chatbots, and virtual assistance. This survey offers a concise overview of the current state and future directions regarding the use of machine learning and deep learning techniques in the context of e-commerce. Further research and development will be necessary to address the evolving challenges and opportunities presented by the dynamic e-commerce landscape. Full article

(This article belongs to the Section e-Commerce Analytics)

► Show Figures

Figure 1

Search Results (32)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (32)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI