MDPI - Publisher of Open Access Journals

23 pages, 28832 KiB

Open AccessArticle

Micro-Expression-Based Facial Analysis for Automated Pain Recognition in Dairy Cattle: An Early-Stage Evaluation

by Shuqiang Zhang, Kashfia Sailunaz and Suresh Neethirajan

AI 2025, 6(9), 199; https://doi.org/10.3390/ai6090199 - 22 Aug 2025

Timely, objective pain recognition in dairy cattle is essential for welfare assurance, productivity, and ethical husbandry yet remains elusive because evolutionary pressure renders bovine distress signals brief and inconspicuous. Without verbal self-reporting, cows suppress overt cues, so automated vision is indispensable for on-farm [...] Read more.

Timely, objective pain recognition in dairy cattle is essential for welfare assurance, productivity, and ethical husbandry yet remains elusive because evolutionary pressure renders bovine distress signals brief and inconspicuous. Without verbal self-reporting, cows suppress overt cues, so automated vision is indispensable for on-farm triage. Although earlier systems tracked whole-body posture or static grimace scales, frame-level detection of facial micro-expressions has not been explored fully in livestock. We translate micro-expression analytics from automotive driver monitoring to the barn, linking modern computer vision with veterinary ethology. Our two-stage pipeline first detects faces and 30 landmarks using a custom You Only Look Once (YOLO) version 8-Pose network, achieving a 96.9% mean average precision (

m A P

) at an Intersection over the Union (IoU) threshold of 0.50 for detection and 83.8% Object Keypoint Similarity (OKS) for keypoint placement. Cropped eye, ear, and muzzle patches are encoded using a pretrained MobileNetV2, generating 3840-dimensional descriptors that capture millisecond muscle twitches. Sequences of five consecutive frames are fed into a 128-unit Long Short-Term Memory (LSTM) classifier that outputs pain probabilities. On a held-out validation set of 1700 frames, the system records 99.65% accuracy and an F1-score of 0.997, with only three false positives and three false negatives. Tested on 14 unseen barn videos, it attains 64.3% clip-level accuracy (i.e., overall accuracy for the whole video clip) and 83% precision for the pain class, using a hybrid aggregation rule that combines a 30% mean probability threshold with micro-burst counting to temper false alarms. As an early exploration from our proof-of-concept study on a subset of our custom dairy farm datasets, these results show that micro-expression mining can deliver scalable, non-invasive pain surveillance across variations in illumination, camera angle, background, and individual morphology. Future work will explore attention-based temporal pooling, curriculum learning for variable window lengths, domain-adaptive fine-tuning, and multimodal fusion with accelerometry on the complete datasets to elevate the performance toward clinical deployment. Full article

41 pages, 5112 KiB

Open AccessArticle

Deepfake Face Detection and Adversarial Attack Defense Method Based on Multi-Feature Decision Fusion

by Shanzhong Lei, Junfang Song, Feiyang Feng, Zhuyang Yan and Aixin Wang

Appl. Sci. 2025, 15(12), 6588; https://doi.org/10.3390/app15126588 - 11 Jun 2025

Viewed by 1519

Abstract

The rapid advancement in deep forgery technology in recent years has created highly deceptive face video content, posing significant security risks. Detecting these fakes is increasingly urgent and challenging. To improve the accuracy of deepfake face detection models and strengthen their resistance to [...] Read more.

The rapid advancement in deep forgery technology in recent years has created highly deceptive face video content, posing significant security risks. Detecting these fakes is increasingly urgent and challenging. To improve the accuracy of deepfake face detection models and strengthen their resistance to adversarial attacks, this manuscript introduces a method for detecting forged faces and defending against adversarial attacks based on a multi-feature decision fusion. This approach allows for rapid detection of fake faces while effectively countering adversarial attacks. Firstly, an improved IMTCCN network was employed to precisely extract facial features, complemented by a diffusion model for noise reduction and artifact removal. Subsequently, the FG-TEFusionNet (Facial-geometry and Texture enhancement fusion-Net) model was developed for deepfake face detection and assessment. This model comprises two key modules: one for extracting temporal features between video frames and another for spatial features within frames. Initially, a facial geometry landmark calibration module based on the LRNet baseline framework ensured an accurate representation of facial geometry. A SENet attention mechanism was then integrated into the dual-stream RNN to enhance the model’s capability to extract inter-frame information and derive preliminary assessment results based on inter-frame relationships. Additionally, a Gram image texture feature module was designed and integrated into EfficientNet and the attention maps of WSDAN (Weakly Supervised Data Augmentation Network). This module aims to extract deep-level feature information from the texture structure of image frames, addressing the limitations of purely geometric features. The final decisions from both modules were integrated using a voting method, completing the deepfake face detection process. Ultimately, the model’s robustness was validated by generating adversarial samples using the I-FGSM algorithm and optimizing model performance through adversarial training. Extensive experiments demonstrated the superior performance and effectiveness of the proposed method across four subsets of FaceForensics++ and the Celeb-DF dataset. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

23 pages, 1664 KiB

Open AccessArticle

Seeing the Unseen: Real-Time Micro-Expression Recognition with Action Units and GPT-Based Reasoning

by Gabriela Laura Sălăgean, Monica Leba and Andreea Cristina Ionica

Appl. Sci. 2025, 15(12), 6417; https://doi.org/10.3390/app15126417 - 6 Jun 2025

Viewed by 1566

Abstract

This paper presents a real-time system for the detection and classification of facial micro-expressions, evaluated on the CASME II dataset. Micro-expressions are brief and subtle indicators of genuine emotions, posing significant challenges for automatic recognition due to their low intensity, short duration, and [...] Read more.

This paper presents a real-time system for the detection and classification of facial micro-expressions, evaluated on the CASME II dataset. Micro-expressions are brief and subtle indicators of genuine emotions, posing significant challenges for automatic recognition due to their low intensity, short duration, and inter-subject variability. To address these challenges, the proposed system integrates advanced computer vision techniques, rule-based classification grounded in the Facial Action Coding System, and artificial intelligence components. The architecture employs MediaPipe for facial landmark tracking and action unit extraction, expert rules to resolve common emotional confusions, and deep learning modules for optimized classification. Experimental validation demonstrated a classification accuracy of 93.30% on CASME II, highlighting the effectiveness of the hybrid design. The system also incorporates mechanisms for amplifying weak signals and adapting to new subjects through continuous knowledge updates. These results confirm the advantages of combining domain expertise with AI-driven reasoning to improve micro-expression recognition. The proposed methodology has practical implications for various fields, including clinical psychology, security, marketing, and human-computer interaction, where the accurate interpretation of emotional micro-signals is essential. Full article

► Show Figures

Figure 1

19 pages, 1840 KiB

Open AccessArticle

Facial Analysis for Plastic Surgery in the Era of Artificial Intelligence: A Comparative Evaluation of Multimodal Large Language Models

by Syed Ali Haider, Srinivasagam Prabha, Cesar A. Gomez-Cabello, Sahar Borna, Ariana Genovese, Maissa Trabilsy, Adekunle Elegbede, Jenny Fei Yang, Andrea Galvao, Cui Tao and Antonio Jorge Forte

J. Clin. Med. 2025, 14(10), 3484; https://doi.org/10.3390/jcm14103484 - 16 May 2025

Cited by 1 | Viewed by 1064

Abstract

Background/Objectives: Facial analysis is critical for preoperative planning in facial plastic surgery, but traditional methods can be time consuming and subjective. This study investigated the potential of Artificial Intelligence (AI) for objective and efficient facial analysis in plastic surgery, with a specific focus [...] Read more.

Background/Objectives: Facial analysis is critical for preoperative planning in facial plastic surgery, but traditional methods can be time consuming and subjective. This study investigated the potential of Artificial Intelligence (AI) for objective and efficient facial analysis in plastic surgery, with a specific focus on Multimodal Large Language Models (MLLMs). We evaluated their ability to analyze facial skin quality, volume, symmetry, and adherence to aesthetic standards such as neoclassical facial canons and the golden ratio. Methods: We evaluated four MLLMs—ChatGPT-4o, ChatGPT-4, Gemini 1.5 Pro, and Claude 3.5 Sonnet—using two evaluation forms and 15 diverse facial images generated by a Generative Adversarial Network (GAN). The general analysis form evaluated qualitative skin features (texture, type, thickness, wrinkling, photoaging, and overall symmetry). The facial ratios form assessed quantitative structural proportions, including division into equal fifths, adherence to the rule of thirds, and compatibility with the golden ratio. MLLM assessments were compared with evaluations from a plastic surgeon and manual measurements of facial ratios. Results: The MLLMs showed promise in analyzing qualitative features, but they struggled with precise quantitative measurements of facial ratios. Mean accuracy for general analysis were ChatGPT-4o (0.61 ± 0.49), Gemini 1.5 Pro (0.60 ± 0.49), ChatGPT-4 (0.57 ± 0.50), and Claude 3.5 Sonnet (0.52 ± 0.50). In facial ratio assessments, scores were lower, with Gemini 1.5 Pro achieving the highest mean accuracy (0.39 ± 0.49). Inter-rater reliability, based on Cohen’s Kappa values, ranged from poor to high for qualitative assessments (κ > 0.7 for some questions) but was generally poor (near or below zero) for quantitative assessments. Conclusions: Current general purpose MLLMs are not yet ready to replace manual clinical assessments but may assist in general facial feature analysis. These findings are based on testing models not specifically trained for facial analysis and serve to raise awareness among clinicians regarding the current capabilities and inherent limitations of readily available MLLMs in this specialized domain. This limitation may stem from challenges with spatial reasoning and fine-grained detail extraction, which are inherent limitations of current MLLMs. Future research should focus on enhancing the numerical accuracy and reliability of MLLMs for broader application in plastic surgery, potentially through improved training methods and integration with other AI technologies such as specialized computer vision algorithms for precise landmark detection and measurement. Full article

(This article belongs to the Special Issue Innovation in Hand Surgery)

► Show Figures

Figure 1

30 pages, 1749 KiB

Open AccessArticle

Deepfake Image Forensics for Privacy Protection and Authenticity Using Deep Learning

by Saud Sohail, Syed Muhammad Sajjad, Adeel Zafar, Zafar Iqbal, Zia Muhammad and Muhammad Kazim

Information 2025, 16(4), 270; https://doi.org/10.3390/info16040270 - 27 Mar 2025

Cited by 1 | Viewed by 4020

Abstract

This research focuses on the detection of deepfake images and videos for forensic analysis using deep learning techniques. It highlights the importance of preserving privacy and authenticity in digital media. The background of the study emphasizes the growing threat of deepfakes, which pose [...] Read more.

This research focuses on the detection of deepfake images and videos for forensic analysis using deep learning techniques. It highlights the importance of preserving privacy and authenticity in digital media. The background of the study emphasizes the growing threat of deepfakes, which pose significant challenges in various domains, including social media, politics, and entertainment. Current methodologies primarily rely on visual features that are specific to the dataset and fail to generalize well across varying manipulation techniques. However, these techniques focus on either spatial or temporal features individually and lack robustness in handling complex deepfake artifacts that involve fused facial regions such as eyes, nose, and mouth. Key approaches include the use of CNNs, RNNs, and hybrid models like CNN-LSTM, CNN-GRU, and temporal convolutional networks (TCNs) to capture both spatial and temporal features during the detection of deepfake videos and images. The research incorporates data augmentation with GANs to enhance model performance and proposes an innovative fusion of artifact inspection and facial landmark detection for improved accuracy. The experimental results show near-perfect detection accuracy across diverse datasets, demonstrating the effectiveness of these models. However, challenges remain, such as the difficulty of detecting deepfakes in compressed video formats, the need for handling noise and addressing dataset imbalances. The research presents an enhanced hybrid model that improves detection accuracy while maintaining performance across various datasets. Future work includes improving model generalization to detect emerging deepfake techniques better. The experimental results reveal a near-perfect accuracy of over 99% across different architectures, highlighting their effectiveness in forensic investigations. Full article

(This article belongs to the Special Issue Real-World Applications of Machine Learning Techniques)

► Show Figures

Figure 1

21 pages, 6255 KiB

Open AccessArticle

Joint Driver State Classification Approach: Face Classification Model Development and Facial Feature Analysis Improvement

by Farkhod Akhmedov, Halimjon Khujamatov, Mirjamol Abdullaev and Heung-Seok Jeon

Sensors 2025, 25(5), 1472; https://doi.org/10.3390/s25051472 - 27 Feb 2025

Viewed by 848

Abstract

Driver drowsiness remains a critical factor in road safety, necessitating the development of robust detection methodologies. This study presents a dual-framework approach that integrates a convolutional neural network (CNN) and a facial landmark analysis model to enhance drowsiness detection. The CNN model classifies [...] Read more.

Driver drowsiness remains a critical factor in road safety, necessitating the development of robust detection methodologies. This study presents a dual-framework approach that integrates a convolutional neural network (CNN) and a facial landmark analysis model to enhance drowsiness detection. The CNN model classifies driver states into “Awake” and “Drowsy”, achieving a classification accuracy of 92.5%. In parallel, a deep learning-based facial landmark analysis model analyzes a driver’s physiological state by extracting and analyzing facial features. The model’s accuracy was significantly enhanced through advanced image preprocessing techniques, including image normalization, illumination correction, and face hallucination, reaching a 97.33% classification accuracy. The proposed dual-model architecture leverages imagery analysis to detect key drowsiness indicators, such as eye closure dynamics, yawning patterns, and head movement trajectories. By integrating CNN-based classification with precise facial landmark analysis, this study not only improves detection robustness but also ensures greater resilience under challenging conditions, such as low-light environments. The findings underscore the efficacy of multi-model approaches in drowsiness detection and their potential for real-world implementation to enhance road safety and mitigate drowsiness-related vehicular accidents. Full article

(This article belongs to the Special Issue Advancements in Deep Image Restoration and Understanding of Low-Quality Images: Technologies and Applications in Sensing Systems)

► Show Figures

Figure 1

18 pages, 1223 KiB

Open AccessArticle

GazeCapsNet: A Lightweight Gaze Estimation Framework

by Shakhnoza Muksimova, Yakhyokhuja Valikhujaev, Sabina Umirzakova, Jushkin Baltayev and Young Im Cho

Sensors 2025, 25(4), 1224; https://doi.org/10.3390/s25041224 - 17 Feb 2025

Cited by 1 | Viewed by 1748

Abstract

Gaze estimation is increasingly pivotal in applications spanning virtual reality, augmented reality, and driver monitoring systems, necessitating efficient yet accurate models for mobile deployment. Current methodologies often fall short, particularly in mobile settings, due to their extensive computational requirements or reliance on intricate [...] Read more.

Gaze estimation is increasingly pivotal in applications spanning virtual reality, augmented reality, and driver monitoring systems, necessitating efficient yet accurate models for mobile deployment. Current methodologies often fall short, particularly in mobile settings, due to their extensive computational requirements or reliance on intricate pre-processing. Addressing these limitations, we present Mobile-GazeCapsNet, an innovative gaze estimation framework that harnesses the strengths of capsule networks and integrates them with lightweight architectures such as MobileNet v2, MobileOne, and ResNet-18. This framework not only eliminates the need for facial landmark detection but also significantly enhances real-time operability on mobile devices. Through the innovative use of Self-Attention Routing, GazeCapsNet dynamically allocates computational resources, thereby improving both accuracy and efficiency. Our results demonstrate that GazeCapsNet achieves competitive performance by optimizing capsule networks for gaze estimation through Self-Attention Routing (SAR), which replaces iterative routing with a lightweight attention-based mechanism, improving computational efficiency. Our results show that GazeCapsNet achieves state-of-the-art (SOTA) performance on several benchmark datasets, including ETH-XGaze and Gaze360, achieving a mean angular error (MAE) reduction of up to 15% compared to existing models. Furthermore, the model maintains a real-time processing capability of 20 milliseconds per frame while requiring only 11.7 million parameters, making it exceptionally suitable for real-time applications in resource-constrained environments. These findings not only underscore the efficacy and practicality of GazeCapsNet but also establish a new standard for mobile gaze estimation technologies. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

24 pages, 2289 KiB

Open AccessArticle

A Non-Invasive Approach for Facial Action Unit Extraction and Its Application in Pain Detection

by Mondher Bouazizi, Kevin Feghoul, Shengze Wang, Yue Yin and Tomoaki Ohtsuki

Bioengineering 2025, 12(2), 195; https://doi.org/10.3390/bioengineering12020195 - 17 Feb 2025

Cited by 1 | Viewed by 2065

Abstract

A significant challenge that hinders advancements in medical research is the sensitive and confidential nature of patient data in available datasets. In particular, sharing patients’ facial images poses considerable privacy risks, especially with the rise of generative artificial intelligence (AI), which could misuse [...] Read more.

A significant challenge that hinders advancements in medical research is the sensitive and confidential nature of patient data in available datasets. In particular, sharing patients’ facial images poses considerable privacy risks, especially with the rise of generative artificial intelligence (AI), which could misuse such data if accessed by unauthorized parties. However, facial expressions are a valuable source of information for doctors and researchers, which creates a need for methods to derive them without compromising patient privacy or safety by exposing identifiable facial images. To address this, we present a quick, computationally efficient method for detecting action units (AUs) and their intensities—key indicators of health and emotion—using only 3D facial landmarks. Our proposed framework extracts 3D face landmarks from video recordings and employs a lightweight neural network (NN) to identify AUs and estimate AU intensities based on these landmarks. Our proposed method reaches a 79.25% F1-score in AU detection for the main AUs, and 0.66 in AU intensity estimation Root Mean Square Error (RMSE). This performance shows that it is possible for researchers to share 3D landmarks, which are far less intrusive, instead of facial images while maintaining high accuracy in AU detection. Moreover, to showcase the usefulness of our AU detection model, using the detected AUs and estimated intensities, we trained state-of-the-art Deep Learning (DL) models to detect pain. Our method reaches 91.16% accuracy in pain detection, which is not far behind the 93.14% accuracy obtained when employing a convolutional neural network (CNN) with residual blocks trained on actual images and the 92.11% accuracy obtained when employing all the ground-truth AUs. Full article

(This article belongs to the Section Biosignal Processing)

► Show Figures

Figure 1

27 pages, 5537 KiB

Open AccessArticle

Real-Time Gaze Estimation Using Webcam-Based CNN Models for Human–Computer Interactions

by Visal Vidhya and Diego Resende Faria

Computers 2025, 14(2), 57; https://doi.org/10.3390/computers14020057 - 10 Feb 2025

Cited by 1 | Viewed by 3589

Abstract

Gaze tracking and estimation are essential for understanding human behavior and enhancing human–computer interactions. This study introduces an innovative, cost-effective solution for real-time gaze tracking using a standard webcam, providing a practical alternative to conventional methods that rely on expensive infrared (IR) cameras. [...] Read more.

Gaze tracking and estimation are essential for understanding human behavior and enhancing human–computer interactions. This study introduces an innovative, cost-effective solution for real-time gaze tracking using a standard webcam, providing a practical alternative to conventional methods that rely on expensive infrared (IR) cameras. Traditional approaches, such as Pupil Center Corneal Reflection (PCCR), require IR cameras to capture corneal reflections and iris glints, demanding high-resolution images and controlled environments. In contrast, the proposed method utilizes a convolutional neural network (CNN) trained on webcam-captured images to achieve precise gaze estimation. The developed deep learning model achieves a mean squared error (MSE) of 0.0112 and an accuracy of 90.98% through a novel trajectory-based accuracy evaluation system. This system involves an animation of a ball moving across the screen, with the user’s gaze following the ball’s motion. Accuracy is determined by calculating the proportion of gaze points falling within a predefined threshold based on the ball’s radius, ensuring a comprehensive evaluation of the system’s performance across all screen regions. Data collection is both simplified and effective, capturing images of the user’s right eye while they focus on the screen. Additionally, the system includes advanced gaze analysis tools, such as heat maps, gaze fixation tracking, and blink rate monitoring, which are all integrated into an intuitive user interface. The robustness of this approach is further enhanced by incorporating Google’s Mediapipe model for facial landmark detection, improving accuracy and reliability. The evaluation results demonstrate that the proposed method delivers high-accuracy gaze prediction without the need for expensive equipment, making it a practical and accessible solution for diverse applications in human–computer interactions and behavioral research. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

26 pages, 3207 KiB

Open AccessArticle

A Novel Face Frontalization Method by Seamlessly Integrating Landmark Detection and Decision Forest into Generative Adversarial Network (GAN)

by Mahmood H. B. Alhlffee and Yea-Shuan Huang

Mathematics 2025, 13(3), 499; https://doi.org/10.3390/math13030499 - 2 Feb 2025

Viewed by 1679

Abstract

In real-world scenarios, posture variation and low-quality image resolution are two well-known factors that compromise the accuracy and reliability of face recognition system. These challenges can be overcome using various methods, including Generative Adversarial Networks (GANs). Despite this, concerns over the accuracy and [...] Read more.

In real-world scenarios, posture variation and low-quality image resolution are two well-known factors that compromise the accuracy and reliability of face recognition system. These challenges can be overcome using various methods, including Generative Adversarial Networks (GANs). Despite this, concerns over the accuracy and reliability of GAN methods are increasing as the facial recognition market expands rapidly. The existing framework such as Two-Pathway GAN (TP-GAN) method has demonstrated that it is superior to numerous GAN methods that provide better face-texture details due to its unique deep neural network structure that allows it to perceive local details and global structure in a supervised manner. TP-GAN overcomes some of the obstacle associated with face frontalization tasks through the use of landmark detection and synthesis functions, but it remains challenging to achieve the desired performance across a wide range of datasets. To address the inherent limitations of TP-GAN, we propose a novel face frontalization method (NFF) combining landmark detection, decision forests, and data augmentation. NFF provides 2D landmark detection to integrate global structure with local details of the generator model so that more accurate facial feature representations and robust feature extractions can be achieved. NFF enhances the stability of the discriminator model over time by integrating decision forest capabilities into the TP-GAN discriminator core architecture that allows us to perform a wide range of facial pose tasks. Moreover, NFF uses data augmentation techniques to maximize training data by generating completely new synthetic data from existing data. Our evaluations are based on the Multi-PIE, FEI, and CAS-PEAL datasets. NFF results indicate that TP-GAN performance can be significantly enhanced by resolving the challenges described above, leading to high quality visualizations and rank-1 face identification. Full article

(This article belongs to the Special Issue Advanced Machine Vision with Mathematics)

► Show Figures

Figure 1

15 pages, 4304 KiB

Open AccessArticle

Face and Voice Recognition-Based Emotion Analysis System (EAS) to Minimize Heterogeneity in the Metaverse

by Surak Son and Yina Jeong

Appl. Sci. 2025, 15(2), 845; https://doi.org/10.3390/app15020845 - 16 Jan 2025

Viewed by 2697

Abstract

The metaverse, where users interact through avatars, is evolving to closely mirror the real world, requiring realistic object responses based on users’ emotions. While technologies like eye-tracking and hand-tracking transfer physical movements into virtual spaces, accurate emotion detection remains challenging. This study proposes [...] Read more.

The metaverse, where users interact through avatars, is evolving to closely mirror the real world, requiring realistic object responses based on users’ emotions. While technologies like eye-tracking and hand-tracking transfer physical movements into virtual spaces, accurate emotion detection remains challenging. This study proposes the “Face and Voice Recognition-based Emotion Analysis System (EAS)” to bridge this gap, assessing emotions through both voice and facial expressions. EAS utilizes a microphone and camera to gauge emotional states, combining these inputs for a comprehensive analysis. It comprises three neural networks: the Facial Emotion Analysis Model (FEAM), which classifies emotions using facial landmarks; the Voice Sentiment Analysis Model (VSAM), which detects vocal emotions even in noisy environments using MCycleGAN; and the Metaverse Emotion Recognition Model (MERM), which integrates FEAM and VSAM outputs to infer overall emotional states. EAS’s three primary modules—Facial Emotion Recognition, Voice Emotion Recognition, and User Emotion Analysis—analyze facial features and vocal tones to detect emotions, providing a holistic emotional assessment for realistic interactions in the metaverse. The system’s performance is validated through dataset testing, and future directions are suggested based on simulation outcomes. Full article

► Show Figures

Figure 1

18 pages, 1761 KiB

Open AccessArticle

Computer Vision-Based Drowsiness Detection Using Handcrafted Feature Extraction for Edge Computing Devices

by Valerius Owen and Nico Surantha

Appl. Sci. 2025, 15(2), 638; https://doi.org/10.3390/app15020638 - 10 Jan 2025

Cited by 1 | Viewed by 2297

Abstract

Drowsy driving contributes to over 6000 fatal incidents annually in the US, underscoring the need for effective, non-intrusive drowsiness detection. This study seeks to address detection challenges, particularly in non-standard head positions. Our innovative approach leverages computer vision by combining facial feature detection [...] Read more.

Drowsy driving contributes to over 6000 fatal incidents annually in the US, underscoring the need for effective, non-intrusive drowsiness detection. This study seeks to address detection challenges, particularly in non-standard head positions. Our innovative approach leverages computer vision by combining facial feature detection using Dlib, head pose estimation with the HOPEnet model, and analyses of the percentage of eyelid closure over time (PERCLOS) and the percentage of mouth opening over time (POM). These are integrated with traditional machine learning models, such as Support Vector Machines, Random Forests, and XGBoost. These models were chosen for their ability to process detailed information from facial landmarks, head poses, PERCLOS, and POM. They achieved a high overall accuracy of 86.848% in detecting drowsiness, with a small overall model size of 5.05 MB and increased computational efficiency. The models were trained on the National Tsing Hua University Driver Drowsiness Detection Dataset, making them highly suitable for devices with a limited computational capacity. Compared to the baseline model from the literature, which achieved an accuracy of 84.82% and a larger overall model size of 37.82 MB, the method proposed in this research shows a notable improvement in the efficiency of the model with relatively similar accuracy. These findings provide a framework for future studies, potentially improving sleepiness detection systems and ultimately saving lives by enhancing road safety. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

25 pages, 8065 KiB

Open AccessArticle

Drowsiness Detection in Drivers Using Facial Feature Analysis

by Ebenezer Essel, Fred Lacy, Fatema Albalooshi, Wael Elmedany and Yasser Ismail

Appl. Sci. 2025, 15(1), 20; https://doi.org/10.3390/app15010020 - 24 Dec 2024

Cited by 5 | Viewed by 3042

Abstract

Drowsiness has been recognized as a leading factor in road accidents worldwide. Despite considerable research in this area, this paper aims to improve the precision of drowsiness detection specifically for long-haul travel by employing the Dlib-based facial feature detection algorithm. This study proposes [...] Read more.

Drowsiness has been recognized as a leading factor in road accidents worldwide. Despite considerable research in this area, this paper aims to improve the precision of drowsiness detection specifically for long-haul travel by employing the Dlib-based facial feature detection algorithm. This study proposes two algorithms: a static and adaptive frame threshold. Both approaches utilize eye closure ratio (ECR) and mouth aperture ratio (MAR) parameters to determine the driver’s level of drowsiness. The static threshold method issues a warning when the ECR and/or MAR values reach specific thresholds. In this method, the ECR threshold is established at 0.15 and the MAR threshold at 0.4. The static threshold method demonstrated an accuracy of 89.4% and a sensitivity of 96.5% using 1000 images. The adaptive frame threshold algorithm uses a counter to monitor the number of consecutive frames that meet the drowsiness criteria before triggering a warning. Additionally, the number of consecutive frames required is adjusted dynamically over time to enhance detection accuracy and more accurately indicate a state of drowsiness. The adaptive frame threshold algorithm was tested using four 30 min videos, from a publicly available dataset achieving a maximum accuracy of 98.2% and a sensitivity of 64.3% with 500 images. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

26 pages, 4018 KiB

Open AccessArticle

A MediaPipe Holistic Behavior Classification Model as a Potential Model for Predicting Aggressive Behavior in Individuals with Dementia

by Ioannis Galanakis, Rigas Filippos Soldatos, Nikitas Karanikolas, Athanasios Voulodimos, Ioannis Voyiatzis and Maria Samarakou

Appl. Sci. 2024, 14(22), 10266; https://doi.org/10.3390/app142210266 - 7 Nov 2024

Cited by 4 | Viewed by 2165

Abstract

This paper introduces a classification model that detects and classifies argumentative behaviors between two individuals by utilizing a machine learning application, based on the MediaPipe Holistic model. The approach involves the distinction between two different classes based on the behavior of two individuals, [...] Read more.

This paper introduces a classification model that detects and classifies argumentative behaviors between two individuals by utilizing a machine learning application, based on the MediaPipe Holistic model. The approach involves the distinction between two different classes based on the behavior of two individuals, argumentative and non-argumentative behaviors, corresponding to verbal argumentative behavior. By using a dataset extracted from video frames of hand gestures, body stance and facial expression, and by using their corresponding landmarks, three different classification models were trained and evaluated. The results indicate that Random Forest Classifier outperformed the other two by classifying argumentative behaviors with 68.07% accuracy and non-argumentative behaviors with 94.18% accuracy, correspondingly. Thus, there is future scope for advancing this classification model to a prediction model, with the aim of predicting aggressive behavior in patients suffering with dementia before their onset. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Image Processing)

► Show Figures

Figure 1

16 pages, 8982 KiB

Open AccessArticle

A Two-Stream Method for Human Action Recognition Using Facial Action Cues

by Zhimao Lai, Yan Zhang and Xiubo Liang

Sensors 2024, 24(21), 6817; https://doi.org/10.3390/s24216817 - 23 Oct 2024

Cited by 1 | Viewed by 1521

Abstract

Human action recognition (HAR) is a critical area in computer vision with wide-ranging applications, including video surveillance, healthcare monitoring, and abnormal behavior detection. Current HAR methods predominantly rely on full-body data, which can limit their effectiveness in real-world scenarios where occlusion is common. [...] Read more.

Human action recognition (HAR) is a critical area in computer vision with wide-ranging applications, including video surveillance, healthcare monitoring, and abnormal behavior detection. Current HAR methods predominantly rely on full-body data, which can limit their effectiveness in real-world scenarios where occlusion is common. In such situations, the face often remains visible, providing valuable cues for action recognition. This paper introduces Face in Action (FIA), a novel two-stream method that leverages facial action cues for robust action recognition under conditions of significant occlusion. FIA consists of an RGB stream and a landmark stream. The RGB stream processes facial image sequences using a fine-spatio-multitemporal (FSM) 3D convolution module, which employs smaller spatial receptive fields to capture detailed local facial movements and larger temporal receptive fields to model broader temporal dynamics. The landmark stream processes facial landmark sequences using a normalized temporal attention (NTA) module within an NTA-GCN block, enhancing the detection of key facial frames and improving overall recognition accuracy. We validate the effectiveness of FIA using the NTU RGB+D and NTU RGB+D 120 datasets, focusing on action categories related to medical conditions. Our experiments demonstrate that FIA significantly outperforms existing methods in scenarios with extensive occlusion, highlighting its potential for practical applications in surveillance and healthcare settings. Full article

(This article belongs to the Special Issue Deep Learning Applications for Pose Estimation and Human Action Recognition)

► Show Figures

Figure 1

Search Results (106)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (106)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI