MDPI - Publisher of Open Access Journals

12 pages, 1419 KB

Open AccessProceeding Paper

A Real-Time Intelligent Surveillance System for Suspicious Behavior and Facial Emotion Analysis Using YOLOv8 and DeepFace

by Uswa Ihsan, Noor Zaman Jhanjhi, Humaira Ashraf, Farzeen Ashfaq and Fikri Arif Wicaksana

Eng. Proc. 2025, 107(1), 59; https://doi.org/10.3390/engproc2025107059 - 4 Sep 2025

Abstract

This study describes the creation of an intelligent surveillance system based on deep learning that aims to improve real-time security monitoring by automatically identifying suspicious activity. By using cutting-edge computer vision techniques, the suggested system overcomes the drawbacks of conventional surveillance that depends [...] Read more.

This study describes the creation of an intelligent surveillance system based on deep learning that aims to improve real-time security monitoring by automatically identifying suspicious activity. By using cutting-edge computer vision techniques, the suggested system overcomes the drawbacks of conventional surveillance that depends on human observation to spot irregularities in public spaces. The system successfully completes motion detection, trajectory analysis, and emotion recognition by using the YOLOv8 model for object detection and DeepFace for facial emotion analysis. Roboflow is used for dataset annotation, model training with optimized parameters, and visualization of object trajectories and detection confidence. The findings show that abnormal behaviors can be accurately identified, with noteworthy observations made about the emotional expressions and movement patterns of those deemed to be threats. Even though the system performs well in real time, issues like misclassification, model explainability, and a lack of diversity in the dataset still exist. Future research will concentrate on integrating multimodal data fusion, deeper models, and temporal sequence analysis to further enhance detection robustness and system intelligence. Full article

(This article belongs to the Proceedings of The 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society)

► Show Figures

Figure 1

17 pages, 2671 KB

Open AccessArticle

Evaluating Emotional Response and Effort in Nautical Simulation Training Using Noninvasive Methods

by Dejan Žagar

Sensors 2025, 25(17), 5508; https://doi.org/10.3390/s25175508 - 4 Sep 2025

Abstract

The purpose of the study is to research emotional labor and cognitive effort in radar-based collision avoidance tasks within a nautical simulator. By assessing participants’ emotional responses and mental strain, the research aimed to identify negative emotional states associated with a lack of [...] Read more.

The purpose of the study is to research emotional labor and cognitive effort in radar-based collision avoidance tasks within a nautical simulator. By assessing participants’ emotional responses and mental strain, the research aimed to identify negative emotional states associated with a lack of experience, which, in the worst-case scenario, could contribute to navigational incidents. Fifteen participants engaged in multiple sessions simulating typical maritime conditions and navigation challenges. Emotional and cognitive effort were evaluated using three primary methods: heart rate monitoring, a Likert-scale questionnaire, and real-time facial expression recognition software. Heart rate data provided physiological indicators of stress, while the questionnaire and facial expressions captured subjective perceptions of difficulty and emotional strain. By correlating the measurements, the study aimed to uncover emotional patterns linked to task difficulty with insight into engagement, attention, and blink rate levels during the simulation, revealing how a lack of experience contributes to negative emotions and human factor errors. The understanding of the emotional labor and effort in maritime navigation training contributes to strategies for reducing incident risk through improved simulation training practices. Full article

(This article belongs to the Special Issue Non-Intrusive Sensors for Human Activity Detection and Recognition)

► Show Figures

Figure 1

10 pages, 1081 KB

Open AccessProceeding Paper

Insights into the Emotion Classification of Artificial Intelligence: Evolution, Application, and Obstacles of Emotion Classification

by Marselina Endah Hiswati, Ema Utami, Kusrini Kusrini and Arief Setyanto

Eng. Proc. 2025, 103(1), 24; https://doi.org/10.3390/engproc2025103024 - 3 Sep 2025

Viewed by 83

Abstract

In this systematic literature review, we examined the integration of emotional intelligence into artificial intelligence (AI) systems, focusing on advancements, challenges, and opportunities in emotion classification technologies. Accurate emotion recognition in AI holds immense potential in healthcare, the IoT, and education. However, challenges [...] Read more.

In this systematic literature review, we examined the integration of emotional intelligence into artificial intelligence (AI) systems, focusing on advancements, challenges, and opportunities in emotion classification technologies. Accurate emotion recognition in AI holds immense potential in healthcare, the IoT, and education. However, challenges such as computational demands, limited dataset diversity, and real-time deployment complexity remain significant. In this review, we included research on emerging solutions like multimodal data processing, attention mechanisms, and real-time emotion tracking to address these issues. By overcoming these issues, AI systems enhance human–AI interactions and expand real-world applications. Recommendations for improving accuracy and scalability in emotion-aware AI are provided based on the review results. Full article

► Show Figures

Figure 1

23 pages, 3668 KB

Open AccessArticle

Graph-Driven Micro-Expression Rendering with Emotionally Diverse Expressions for Lifelike Digital Humans

by Lei Fang, Fan Yang, Yichen Lin, Jing Zhang and Mincheol Whang

Biomimetics 2025, 10(9), 587; https://doi.org/10.3390/biomimetics10090587 - 3 Sep 2025

Viewed by 161

Abstract

Micro-expressions, characterized by brief and subtle facial muscle movements, are essential for conveying nuanced emotions in digital humans, yet existing rendering techniques often produce rigid or emotionally monotonous animations due to the inadequate modeling of temporal dynamics and action unit interdependencies. This paper [...] Read more.

Micro-expressions, characterized by brief and subtle facial muscle movements, are essential for conveying nuanced emotions in digital humans, yet existing rendering techniques often produce rigid or emotionally monotonous animations due to the inadequate modeling of temporal dynamics and action unit interdependencies. This paper proposes a graph-driven framework for micro-expression rendering that generates emotionally diverse and lifelike expressions. We employ a 3D-ResNet-18 backbone network to perform joint spatio-temporal feature extraction from facial video sequences, enhancing sensitivity to transient motion cues. Action units (AUs) are modeled as nodes in a symmetric graph, with edge weights derived from empirical co-occurrence probabilities and processed via a graph convolutional network to capture structural dependencies and symmetric interactions. This symmetry is justified by the inherent bilateral nature of human facial anatomy, where AU relationships are based on co-occurrence and facial anatomy analysis (as per the FACS), which are typically undirected and symmetric. Human faces are symmetric, and such relationships align with the design of classic spectral GCNs for undirected graphs, assuming that adjacency matrices are symmetric to model non-directional co-occurrences effectively. Predicted AU activations and timestamps are interpolated into continuous motion curves using B-spline functions and mapped to skeletal controls within a real-time animation pipeline (Unreal Engine). Experiments on the CASME II dataset demonstrate superior performance, achieving an F1-score of 77.93% and an accuracy of 84.80% (k-fold cross-validation, k = 5), outperforming baselines in temporal segmentation. Subjective evaluations confirm that the rendered digital human exhibits improvements in perceptual clarity, naturalness, and realism. This approach bridges micro-expression recognition and high-fidelity facial animation, enabling more expressive virtual interactions through curve extraction from AU values and timestamps. Full article

(This article belongs to the Section Bioinspired Sensorics, Information Processing and Control)

► Show Figures

Figure 1

25 pages, 4433 KB

Open AccessArticle

Mathematical Analysis and Performance Evaluation of CBAM-DenseNet121 for Speech Emotion Recognition Using the CREMA-D Dataset

by Zineddine Sarhani Kahhoul, Nadjiba Terki, Ilyes Benaissa, Khaled Aldwoah, E. I. Hassan, Osman Osman and Djamel Eddine Boukhari

Appl. Sci. 2025, 15(17), 9692; https://doi.org/10.3390/app15179692 - 3 Sep 2025

Viewed by 182

Abstract

Emotion recognition from speech is essential for human–computer interaction (HCI) and affective computing, with applications in virtual assistants, healthcare, and education. Although deep learning has made significant advancements in Automatic Speech Emotion Recognition (ASER), the challenge still exists in the task given variation [...] Read more.

Emotion recognition from speech is essential for human–computer interaction (HCI) and affective computing, with applications in virtual assistants, healthcare, and education. Although deep learning has made significant advancements in Automatic Speech Emotion Recognition (ASER), the challenge still exists in the task given variation in speakers, subtle emotional expressions, and environmental noise. Practical deployment in this context depends on a strong, fast, scalable recognition system. This work introduces a new framework combining DenseNet121, especially fine-tuned for the crowd-sourced emotional multimodal actors dataset (CREMA-D), with the convolutional block attention module (CBAM). While DenseNet121’s effective feature propagation captures rich, hierarchical patterns in the speech data, CBAM improves the focus of the model on emotionally significant elements by applying both spatial and channel-wise attention. Furthermore, enhancing the input spectrograms and strengthening resistance against environmental noise is an advanced preprocessing pipeline including log-Mel spectrogram transformation and normalization. The proposed model demonstrates superior performance. To make sure the evaluation is strong even if there is a class imbalance, we point out important metrics like an Unweighted Average Recall (UAR) of 71.01% and an F1 score of 71.25%. The model also gets a test accuracy of 71.26% and a precision of 71.30%. These results establish the model as a promising solution for real-world speech emotion detection, highlighting its strong generalization capabilities, computational efficiency, and focus on emotion-specific features compared to recent work. The improvements demonstrate practical flexibility, enabling the integration of established image recognition techniques and allowing for substantial adaptability in various application contexts. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 47099 KB

Open AccessArticle

Deciphering Emotions in Children’s Storybooks: A Comparative Analysis of Multimodal LLMs in Educational Applications

by Bushra Asseri, Estabrag Abaker, Maha Al Mogren, Tayef Alhefdhi and Areej Al-Wabil

AI 2025, 6(9), 211; https://doi.org/10.3390/ai6090211 - 2 Sep 2025

Viewed by 256

Abstract

Emotion recognition capabilities in multimodal AI systems are crucial for developing culturally responsive educational technologies yet remain underexplored for Arabic language contexts, where culturally appropriate learning tools are critically needed. This study evaluated the emotion recognition performance of two advanced multimodal large language [...] Read more.

Emotion recognition capabilities in multimodal AI systems are crucial for developing culturally responsive educational technologies yet remain underexplored for Arabic language contexts, where culturally appropriate learning tools are critically needed. This study evaluated the emotion recognition performance of two advanced multimodal large language models, GPT-4o and Gemini 1.5 Pro, when processing Arabic children’s storybook illustrations. We assessed both models across three prompting strategies (zero-shot, few-shot, and chain-of-thought) using 75 images from seven Arabic storybooks, comparing model predictions with human annotations based on Plutchik’s emotional framework. GPT-4o consistently outperformed Gemini across all conditions, achieving the highest macro F1-score of 59% with chain-of-thought prompting compared to Gemini’s best performance of 43%. Error analysis revealed systematic misclassification patterns, with valence inversions accounting for 60.7% of errors, while both models struggled with culturally nuanced emotions and ambiguous narrative contexts. These findings highlight fundamental limitations in current models’ cultural understanding and emphasize the need for culturally sensitive training approaches to develop effective emotion-aware educational technologies for Arabic-speaking learners. Full article

(This article belongs to the Special Issue Exploring the Use of Artificial Intelligence in Education)

► Show Figures

Figure 1

20 pages, 3439 KB

Open AccessArticle

Multimodal Emotion Recognition Based on Graph Neural Networks

by Zhongwen Tu, Raoxin Yan, Sihan Weng, Jiatong Li and Wei Zhao

Appl. Sci. 2025, 15(17), 9622; https://doi.org/10.3390/app15179622 - 1 Sep 2025

Viewed by 333

Abstract

Emotion recognition remains a challenging task in human–computer interaction. With advancements in multimodal computing, multimodal emotion recognition has become increasingly important and significant. To address the existing limitations in multimodal fusion efficiency, emotional–semantic association mining, and long-range context modeling, we propose an innovative [...] Read more.

Emotion recognition remains a challenging task in human–computer interaction. With advancements in multimodal computing, multimodal emotion recognition has become increasingly important and significant. To address the existing limitations in multimodal fusion efficiency, emotional–semantic association mining, and long-range context modeling, we propose an innovative graph neural network (GNN)-based framework. Our methodology integrates three key components: (1) a hierarchical sequential fusion (HSF) multimodal integration approach, (2) a sentiment–emotion enhanced joint learning framework, and (3) a context-similarity dual-layer graph architecture (CS-BiGraph). The experimental results demonstrate that our method achieves 69.1% accuracy on the IEMOCAP dataset, establishing a new state-of-the-art performance. For future work, we will explore robust extensions of our framework under real-world scenarios with higher noise levels and investigate the integration of emerging modalities for broader applicability. Full article

(This article belongs to the Special Issue Advanced Technologies and Applications of Emotion Recognition)

► Show Figures

Figure 1

15 pages, 1780 KB

Open AccessArticle

Prosodic Spatio-Temporal Feature Fusion with Attention Mechanisms for Speech Emotion Recognition

by Kristiawan Nugroho, Imam Husni Al Amin, Nina Anggraeni Noviasari and De Rosal Ignatius Moses Setiadi

Computers 2025, 14(9), 361; https://doi.org/10.3390/computers14090361 - 31 Aug 2025

Viewed by 284

Abstract

Speech Emotion Recognition (SER) plays a vital role in supporting applications such as healthcare, human–computer interaction, and security. However, many existing approaches still face challenges in achieving robust generalization and maintaining high recall, particularly for emotions related to stress and anxiety. This study [...] Read more.

Speech Emotion Recognition (SER) plays a vital role in supporting applications such as healthcare, human–computer interaction, and security. However, many existing approaches still face challenges in achieving robust generalization and maintaining high recall, particularly for emotions related to stress and anxiety. This study proposes a dual-stream hybrid model that combines prosodic features with spatio-temporal representations derived from the Multitaper Mel-Frequency Spectrogram (MTMFS) and the Constant-Q Transform Spectrogram (CQTS). Prosodic cues, including pitch, intensity, jitter, shimmer, HNR, pause rate, and speech rate, were processed using dense layers, while MTMFS and CQTS features were encoded with CNN and BiGRU. A Multi-Head Attention mechanism was then applied to adaptively fuse the two feature streams, allowing the model to focus on the most relevant emotional cues. Evaluations conducted on the RAVDESS dataset with subject-independent 5-fold cross-validation demonstrated an accuracy of 97.64% and a macro F1-score of 0.9745. These results confirm that combining prosodic and advanced spectrogram features with attention-based fusion improves precision, recall, and overall robustness, offering a promising framework for more reliable SER systems. Full article

(This article belongs to the Special Issue Multimodal Pattern Recognition of Social Signals in HCI (2nd Edition))

► Show Figures

Figure 1

30 pages, 2050 KB

Open AccessArticle

An Ensemble Learning Approach for Facial Emotion Recognition Based on Deep Learning Techniques

by Manal Almubarak and Fawaz A. Alsulaiman

Electronics 2025, 14(17), 3415; https://doi.org/10.3390/electronics14173415 - 27 Aug 2025

Viewed by 467

Abstract

Facial emotion recognition (FER) is an evolving sub-field of computer vision and affective computing. It entails the development of algorithms and models to detect, analyze, and interpret facial expressions, thereby determining individuals’ emotional states. This paper explores the effectiveness of transfer learning using [...] Read more.

Facial emotion recognition (FER) is an evolving sub-field of computer vision and affective computing. It entails the development of algorithms and models to detect, analyze, and interpret facial expressions, thereby determining individuals’ emotional states. This paper explores the effectiveness of transfer learning using the EfficientNet-B0 convolutional neural network for FER, alongside the utilization of stacking techniques. The pretrained EfficientNet-B0 model is employed to train on a dataset comprising a diverse range of natural human face images for emotion recognition. This dataset consists of grayscale images categorized into eight distinct emotion classes. Our approach involves fine-tuning the pretrained EfficientNet-B0 model, adapting its weights and layers to capture subtle facial expressions. Moreover, this study utilizes ensemble learning by integrating transfer learning from pretrained models, a strategic tuning approach, binary classifiers, and a meta-classifier. Our approach achieves superior performance in accurately identifying and classifying emotions within facial images. Experimental results for the meta-classifier demonstrate 100% accuracy on the test set. For further assessment, we also train our meta-classifier on a Cohn–Kanade (CK+) dataset, achieving 92% accuracy on the test set. These findings highlight the effectiveness and potential of employing transfer learning and stacking techniques with EfficientNet-B0 for FER tasks. Full article

► Show Figures

Figure 1

23 pages, 3014 KB

Open AccessArticle

Multimodal Emotion Recognition for Seafarers: A Framework Integrating Improved D-S Theory and Calibration: A Case Study of a Real Navigation Experiment

by Liu Yang, Junzhang Yang, Chengdeng Cao, Mingshuang Li, Peng Fei and Qing Liu

Appl. Sci. 2025, 15(17), 9253; https://doi.org/10.3390/app15179253 - 22 Aug 2025

Viewed by 363

Abstract

The influence of seafarers’ emotions on work performance can lead to severe marine accidents. However, research on emotion recognition (ER) of seafarers remains insufficient, and existing studies only deploy single models and disregard the model’s uncertainty, which might lead to unreliable recognition. In [...] Read more.

The influence of seafarers’ emotions on work performance can lead to severe marine accidents. However, research on emotion recognition (ER) of seafarers remains insufficient, and existing studies only deploy single models and disregard the model’s uncertainty, which might lead to unreliable recognition. In this paper, a novel fusion framework for seafarer ER is proposed. Firstly, feature-level fusion using Electroencephalogram (EEG) and navigation data collected in a real navigation environment was conducted. Then, calibration is employed to mitigate the uncertainty of the outcomes. Secondly, a weight combination strategy for decision fusion was designed. Finally, we conduct a series of evaluations of the proposed model. The results showed that the average recognition performance across the three emotional dimensions, as measured by accuracy, precision, recall, and F1 score, reaches 85.14%, 84.43%, 86.27%, and 85.33%, respectively. The results demonstrate that the use of physiological and navigation data can effectively identify seafarers’ emotional states. Additionally, the fusion model compensates for the uncertainty of single models and enhances the performance of ER for seafarers, which provides a feasible path for the ER of seafarers. The findings of this study can be used to promptly identify the emotional state of seafarers and develop early warnings for bridge systems for shipping companies and help inform policy-making on human factors to enhance maritime safety. Full article

(This article belongs to the Section Marine Science and Engineering)

► Show Figures

Figure 1

28 pages, 1036 KB

Open AccessReview

Recent Advances in Portable Dry Electrode EEG: Architecture and Applications in Brain-Computer Interfaces

by Meihong Zhang, Bocheng Qian, Jianming Gao, Shaokai Zhao, Yibo Cui, Zhiguo Luo, Kecheng Shi and Erwei Yin

Sensors 2025, 25(16), 5215; https://doi.org/10.3390/s25165215 - 21 Aug 2025

Viewed by 1040

Abstract

As brain–computer interface (BCI) technology continues to advance, research on human brain function has gradually transitioned from theoretical investigation to practical engineering applications. To support EEG signal acquisition in a variety of real-world scenarios, BCI electrode systems must demonstrate a balanced combination of [...] Read more.

As brain–computer interface (BCI) technology continues to advance, research on human brain function has gradually transitioned from theoretical investigation to practical engineering applications. To support EEG signal acquisition in a variety of real-world scenarios, BCI electrode systems must demonstrate a balanced combination of electrical performance, wearing comfort, and portability. Dry electrodes have emerged as a promising alternative for EEG acquisition due to their ability to operate without conductive gel or complex skin preparation. This paper reviews the latest progress in dry electrode EEG systems, summarizing key achievements in hardware design with a focus on structural innovation and material development. It also examines application advances in several representative BCI domains, including emotion recognition, fatigue and drowsiness detection, motor imagery, and steady-state visual evoked potentials, while analyzing system-level performance. Finally, the paper critically assesses existing challenges and identifies critical future research priorities. Key recommendations include developing a standardized evaluation framework to bolster research reliability, enhancing generalization performance, and fostering coordinated hardware-algorithm optimization. These steps are crucial for advancing the practical implementation of these technologies across diverse scenarios. With this survey, we aim to offer a comprehensive reference and roadmap for researchers engaged in the development and implementation of next-generation dry electrode EEG-based BCI systems. Full article

(This article belongs to the Topic The AI Revolution: Driving the Evolution of Robotics and Smart Systems)

► Show Figures

Figure 1

23 pages, 811 KB

Open AccessArticle

Efficient Dynamic Emotion Recognition from Facial Expressions Using Statistical Spatio-Temporal Geometric Features

by Yacine Yaddaden

Big Data Cogn. Comput. 2025, 9(8), 213; https://doi.org/10.3390/bdcc9080213 - 19 Aug 2025

Viewed by 630

Abstract

Automatic Facial Expression Recognition (AFER) is a key component of affective computing, enabling machines to recognize and interpret human emotions across various applications such as human–computer interaction, healthcare, entertainment, and social robotics. Dynamic AFER systems, which exploit image sequences, can capture the temporal [...] Read more.

Automatic Facial Expression Recognition (AFER) is a key component of affective computing, enabling machines to recognize and interpret human emotions across various applications such as human–computer interaction, healthcare, entertainment, and social robotics. Dynamic AFER systems, which exploit image sequences, can capture the temporal evolution of facial expressions but often suffer from high computational costs, limiting their suitability for real-time use. In this paper, we propose an efficient dynamic AFER approach based on a novel spatio-temporal representation. Facial landmarks are extracted, and all possible Euclidean distances are computed to model the spatial structure. To capture temporal variations, three statistical metrics are applied to each distance sequence. A feature selection stage based on the Extremely Randomized Trees (ExtRa-Trees) algorithm is then performed to reduce dimensionality and enhance classification performance. Finally, the emotions are classified using a linear multi-class Support Vector Machine (SVM) and compared against the k-Nearest Neighbors (k-NN) method. The proposed approach is evaluated on three benchmark datasets: CK+, MUG, and MMI, achieving recognition rates of 94.65%, 93.98%, and 75.59%, respectively. Our results demonstrate that the proposed method achieves a strong balance between accuracy and computational efficiency, making it well-suited for real-time facial expression recognition applications. Full article

(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)

► Show Figures

Figure 1

23 pages, 10088 KB

Open AccessArticle

Development of an Interactive Digital Human with Context-Sensitive Facial Expressions

by Fan Yang, Lei Fang, Rui Suo, Jing Zhang and Mincheol Whang

Sensors 2025, 25(16), 5117; https://doi.org/10.3390/s25165117 - 18 Aug 2025

Viewed by 583

Abstract

With the increasing complexity of human–computer interaction scenarios, conventional digital human facial expression systems show notable limitations in handling multi-emotion co-occurrence, dynamic expression, and semantic responsiveness. This paper proposes a digital human system framework that integrates multimodal emotion recognition and compound facial expression [...] Read more.

With the increasing complexity of human–computer interaction scenarios, conventional digital human facial expression systems show notable limitations in handling multi-emotion co-occurrence, dynamic expression, and semantic responsiveness. This paper proposes a digital human system framework that integrates multimodal emotion recognition and compound facial expression generation. The system establishes a complete pipeline for real-time interaction and compound emotional expression, following a sequence of “speech semantic parsing—multimodal emotion recognition—Action Unit (AU)-level 3D facial expression control.” First, a ResNet18-based model is employed for robust emotion classification using the AffectNet dataset. Then, an AU motion curve driving module is constructed on the Unreal Engine platform, where dynamic synthesis of basic emotions is achieved via a state-machine mechanism. Finally, Generative Pre-trained Transformer (GPT) is utilized for semantic analysis, generating structured emotional weight vectors that are mapped to the AU layer to enable language-driven facial responses. Experimental results demonstrate that the proposed system significantly improves facial animation quality, with naturalness increasing from 3.54 to 3.94 and semantic congruence from 3.44 to 3.80. These results validate the system’s capability to generate realistic and emotionally coherent expressions in real time. This research provides a complete technical framework and practical foundation for high-fidelity digital humans with affective interaction capabilities. Full article

(This article belongs to the Special Issue Emotion Recognition Based on Sensors (3rd Edition))

► Show Figures

Figure 1

30 pages, 4741 KB

Open AccessArticle

TriViT-Lite: A Compact Vision Transformer–MobileNet Model with Texture-Aware Attention for Real-Time Facial Emotion Recognition in Healthcare

by Waqar Riaz, Jiancheng (Charles) Ji and Asif Ullah

Electronics 2025, 14(16), 3256; https://doi.org/10.3390/electronics14163256 - 16 Aug 2025

Viewed by 355

Abstract

Facial emotion recognition has become increasingly important in healthcare, where understanding delicate cues like pain, discomfort, or unconsciousness can support more timely and responsive care. Yet, recognizing facial expressions in real-world settings remains challenging due to varying lighting, facial occlusions, and hardware limitations [...] Read more.

Facial emotion recognition has become increasingly important in healthcare, where understanding delicate cues like pain, discomfort, or unconsciousness can support more timely and responsive care. Yet, recognizing facial expressions in real-world settings remains challenging due to varying lighting, facial occlusions, and hardware limitations in clinical environments. To address this, we propose TriViT-Lite, a lightweight yet powerful model that blends three complementary components: MobileNet, for capturing fine-grained local features efficiently; Vision Transformers (ViT), for modeling global facial patterns; and handcrafted texture descriptors, such as Local Binary Patterns (LBP) and Histograms of Oriented Gradients (HOG), for added robustness. These multi-scale features are brought together through a texture-aware cross-attention fusion mechanism that helps the model focus on the most relevant facial regions dynamically. TriViT-Lite is evaluated on both benchmark datasets (FER2013, AffectNet) and a custom healthcare-oriented dataset covering seven critical emotional states, including pain and unconsciousness. It achieves a competitive accuracy of 91.8% on FER2013 and of 87.5% on the custom dataset while maintaining real-time performance (~15 FPS) on resource-constrained edge devices. Our results show that TriViT-Lite offers a practical and accurate solution for real-time emotion recognition, particularly in healthcare settings. It strikes a balance between performance, interpretability, and efficiency, making it a strong candidate for machine-learning-driven pattern recognition in patient-monitoring applications. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

27 pages, 2985 KB

Open AccessArticle

FPGA Chip Design of Sensors for Emotion Detection Based on Consecutive Facial Images by Combining CNN and LSTM

by Shing-Tai Pan and Han-Jui Wu

Electronics 2025, 14(16), 3250; https://doi.org/10.3390/electronics14163250 - 15 Aug 2025

Viewed by 467

Abstract

This paper proposes emotion recognition methods for consecutive facial images and implements the inference of a neural network model on a field-programmable gate array (FPGA) for real-time sensing of human motion. The proposed emotion recognition methods are based on a neural network architecture [...] Read more.

This paper proposes emotion recognition methods for consecutive facial images and implements the inference of a neural network model on a field-programmable gate array (FPGA) for real-time sensing of human motion. The proposed emotion recognition methods are based on a neural network architecture called Convolutional Long Short-Term Memory Fully Connected Deep Neural Network (CLDNN), which combines convolutional neural networks (CNNs) for spatial feature extraction, long short-term memory (LSTM) for temporal modeling, and fully connected neural networks (FCNNs) for final classification. This architecture can analyze the local feature sequences obtained through convolution of data, making it suitable for processing time-series data such as consecutive facial images. The method achieves an average recognition rate of 99.51% on the RAVDESS database, 87.80% on the BAUM-1s database and 96.82% on the eNTERFACE’05 database, using 10-fold cross-validation on a personal computer (PC). The comparisons in this paper show that our methods outperform existing related works in recognition accuracy. The same model is implemented on an FPGA chip, where it achieves identical accuracy to that on a PC, confirming both its effectiveness and hardware compatibility. Full article

(This article belongs to the Special Issue Lab-on-Chip Biosensors)

► Show Figures

Figure 1

Search Results (538)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (538)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI