MDPI - Publisher of Open Access Journals

17 pages, 2174 KB

Open AccessArticle

RadarSSM: A Lightweight Spatiotemporal State Space Network for Efficient Radar-Based Human Activity Recognition

by Rubin Zhao, Fucheng Miao and Yuanjian Liu

Sensors 2026, 26(7), 2259; https://doi.org/10.3390/s26072259 - 6 Apr 2026

Viewed by 193

Millimeter-wave radar has gradually gained popularity as a sensor mode for Human Activity Recognition (HAR) in recent years because it preserves the privacy of individuals and is resistant to environmental conditions. Nevertheless, the fast inference of high-dimensional and sparse 4D radar data is [...] Read more.

Millimeter-wave radar has gradually gained popularity as a sensor mode for Human Activity Recognition (HAR) in recent years because it preserves the privacy of individuals and is resistant to environmental conditions. Nevertheless, the fast inference of high-dimensional and sparse 4D radar data is still difficult to perform on low-resource edge devices. Current models, including 3D Convolutional Neural Networks and Transformer-based models, are frequently plagued by extensive parameter overhead or quadratic computational complexity, which restricts their applicability to edge applications. The present paper attempts to resolve these issues by introducing RadarSSM as a lightweight spatiotemporal hybrid network in the context of radar-based HAR. The explicit separation of spatial feature extraction and temporal dependency modeling helps RadarSSM decrease the overall complexity of computation significantly. Specifically, a spatial encoder based on depthwise separable 3D convolutions is designed to efficiently capture fine-grained geometric and motion features from voxelized radar data. For temporal modeling, a bidirectional State Space Model is introduced to capture long-range temporal dependencies with linear time complexity

O (T)

, thereby avoiding the quadratic cost associated with self-attention mechanisms. Extensive experiments conducted on public radar HAR datasets demonstrate that RadarSSM achieves accuracy competitive with state-of-the-art methods while substantially reducing parameter count and computational cost relative to representative convolutional baselines. These results validate the effectiveness of RadarSSM and highlight its suitability for efficient radar sensing on edge hardware. Full article

(This article belongs to the Special Issue Radar and Multimodal Sensing for Ambient Assisted Living)

► Show Figures

Figure 1

18 pages, 2109 KB

Open AccessArticle

PAGF: Short-Horizon Forecasting of 3D Facial Landmarks

by Mingzhu Yan, Ye Yuan, Jian Liu and Fangyan Yang

Mathematics 2026, 14(7), 1222; https://doi.org/10.3390/math14071222 - 6 Apr 2026

Viewed by 148

Abstract

Short-term facial landmark forecasting is important for anticipatory facial behavior in human–robot interaction, yet models trained with pointwise reconstruction losses often suffer from mean reversion, producing low-error predictions with weakened motion dynamics. To address this issue, we propose a peak-aware gated recurrent unit [...] Read more.

Short-term facial landmark forecasting is important for anticipatory facial behavior in human–robot interaction, yet models trained with pointwise reconstruction losses often suffer from mean reversion, producing low-error predictions with weakened motion dynamics. To address this issue, we propose a peak-aware gated recurrent unit (GRU) framework that separates forecasting into peak planning and peak-conditioned trajectory generation. The planning stage estimates the timing and intensity of a salient motion peak within the forecast horizon together with a global motion direction, and the generation stage produces short-horizon landmark displacements through temporal gating and structured motion composition. The model is trained with reconstruction loss, peak supervision, peak-integrity regularization, and correlation-based temporal-shape regularization. Experiments on the MEAD dataset using 3D facial landmarks under a subject-independent protocol show a clear distortion–dynamics trade-off. Compared with static and sequence-to-sequence baselines, the proposed method better preserves peak-related facial dynamics while maintaining competitive 24-step prediction accuracy. Full article

(This article belongs to the Special Issue Advanced Control of Complex Dynamical Systems and Robotics with Applications)

► Show Figures

Figure 1

31 pages, 6317 KB

Open AccessArticle

A Method for Human Pose Estimation and Joint Angle Computation Through Deep Learning

by Ludovica Ciardiello, Patrizia Agnello, Marta Petyx, Fabio Martinelli, Mario Cesarelli, Antonella Santone and Francesco Mercaldo

J. Imaging 2026, 12(4), 157; https://doi.org/10.3390/jimaging12040157 - 6 Apr 2026

Viewed by 207

Abstract

Human pose estimation is a crucial task in computer vision with widespread applications in healthcare, rehabilitation, sports, and remote monitoring. In this paper, we propose a deep learning-based method for automatic human pose estimation and joint angle computation, tailored specifically for physiotherapy and [...] Read more.

Human pose estimation is a crucial task in computer vision with widespread applications in healthcare, rehabilitation, sports, and remote monitoring. In this paper, we propose a deep learning-based method for automatic human pose estimation and joint angle computation, tailored specifically for physiotherapy and telemedicine scenarios. Beyond pose estimation, the proposed method is able to compute angles between joints, enabling analysis of body alignment and posture. The proposed approach is built upon a customized skeleton with 25 anatomical keypoints and a dataset composed of over 150,000 annotated and augmented images derived from multiple open-source datasets. Experimental results demonstrate the effectiveness of the proposed method, achieving a mAP@50 of 0.58 for keypoint localization and 0.98 for object detection. Moreover, we demonstrate several real-world practical use cases in evaluating exercise correctness and identifying postural deviations by exploiting the proposed method, confirming that the proposed method can represent a promising approach for automated motion analysis, with potential impact on digital health, rehabilitation support, and remote patient care. Full article

(This article belongs to the Section AI in Imaging)

► Show Figures

Figure 1

20 pages, 6648 KB

Open AccessArticle

Sensorless Collision Detection and Classification in Collaborative Robots Using Stacked GRU Networks

by Jong Hyeok Lee, Minjae Hong and Kyu Min Park

Actuators 2026, 15(4), 206; https://doi.org/10.3390/act15040206 - 4 Apr 2026

Viewed by 178

Abstract

The increasing deployment of collaborative robots in industrial manufacturing environments has enabled close human–robot collaboration, making rapid and reliable collision detection essential for worker safety. This paper presents a learning-based framework for real-time detection and classification of hard and soft collisions using stacked [...] Read more.

The increasing deployment of collaborative robots in industrial manufacturing environments has enabled close human–robot collaboration, making rapid and reliable collision detection essential for worker safety. This paper presents a learning-based framework for real-time detection and classification of hard and soft collisions using stacked Gated Recurrent Unit (GRU) networks. A two-stage pipeline is introduced, in which collision detection and collision type classification are performed sequentially using separate models, and its performance is validated through extensive experiments on a collision dataset collected from a six-joint collaborative robot executing random point-to-point motions. Without requiring joint torque sensors, unmodeled joint friction is implicitly compensated through learning for both detection and classification. Compared to our previous work, the proposed method achieves improved detection performance, and its robustness is further demonstrated through systematic generalization experiments under simulated dynamic model uncertainties. In addition, the classification model accurately distinguishes between hard and soft collisions, providing a basis for differentiated post-collision reaction strategies. Overall, the proposed sensorless collision detection and classification framework provides a practical and cost-effective solution for real-world industrial human–robot collaboration. Full article

(This article belongs to the Special Issue Machine Learning for Actuation and Control in Robotic Joint Systems)

► Show Figures

Figure 1

24 pages, 8557 KB

Open AccessArticle

Dynamic Modelling and Control Strategy Analysis of a Lower-Limb Exoskeleton

by Huanrong Xiao, Teng Ran and Afang Jin

Sensors 2026, 26(7), 2124; https://doi.org/10.3390/s26072124 - 29 Mar 2026

Viewed by 323

Abstract

Lower-limb exoskeleton robots play a pivotal role in rehabilitation medicine and assistive augmentation, where precise dynamic modelling and trajectory tracking control are fundamental to effective assistance. Existing models predominantly focus on hip and knee rotational degrees of freedom, with insufficient attention to ankle [...] Read more.

Lower-limb exoskeleton robots play a pivotal role in rehabilitation medicine and assistive augmentation, where precise dynamic modelling and trajectory tracking control are fundamental to effective assistance. Existing models predominantly focus on hip and knee rotational degrees of freedom, with insufficient attention to ankle dynamics and pelvic translation. To address these limitations, this paper establishes a sagittal-plane dynamic model comprising nine generalised coordinates, treating the human lower limb and exoskeleton as an integrated coupled system. A seven-segment kinematic model encompassing the trunk, bilateral thighs, shanks, and feet is constructed via a modified Denavit–Hartenberg parameter method, and dynamic equations are derived using Lagrangian formulation. Three control strategies—PD control, PD with gravity compensation, and the computed torque method—are designed and evaluated through simulations using gait data from five subjects (two self-collected, three from a public dataset) acquired via Vicon motion capture. Results demonstrate that the computed torque method achieves a joint angle tracking root mean square error (RMSE) of 0.59°, representing an 86.3% improvement over conventional PD control, while maintaining a low control torque RMS of 4.44 N·m. The controller exhibits stable tracking performance across walking speeds of 0.4–1.45 m/s, validating the effectiveness of the proposed model and control strategies. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

22 pages, 8847 KB

Open AccessArticle

DGAGaze: Gaze Estimation with Dual-Stream Differential Attention and Geometry-Aware Temporal Alignment

by Wei Zhang and Pengcheng Li

Appl. Sci. 2026, 16(7), 3298; https://doi.org/10.3390/app16073298 - 29 Mar 2026

Viewed by 250

Abstract

Gaze estimation plays a crucial role in human-computer interaction and behavior analysis. However, in dynamic scenes, rigid head movements and rapid gaze shifts pose significant challenges to accurate gaze prediction. Most existing methods either process single-frame images independently or rely on long video [...] Read more.

Gaze estimation plays a crucial role in human-computer interaction and behavior analysis. However, in dynamic scenes, rigid head movements and rapid gaze shifts pose significant challenges to accurate gaze prediction. Most existing methods either process single-frame images independently or rely on long video sequences, making it difficult to simultaneously achieve strong performance and high computational efficiency. To address this issue, we propose DGAGaze, a gaze estimation framework based on a difference-driven spatiotemporal attention mechanism. This framework uses a geometry-aware temporal alignment module to mitigate interference from rigid head movements, compensating for them through pose estimation and affine feature warping, thereby achieving explicit decoupling between global head motion and local eye motion. Based on the aligned features, inter-frame differences are used to adjust spatial and channel attention weights, enhancing motion-sensitive representations without introducing an additional temporal modeling layer. Extensive experiments on the EyeDiap and Gaze360 datasets demonstrate the effectiveness of the proposed approach. DGAGaze achieves improved gaze estimation accuracy while maintaining a lightweight architecture based on a ResNet-18 backbone, outperforming existing state-of-the-art methods. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

27 pages, 16965 KB

Open AccessFeature PaperArticle

On-Device Motion Activity Intensity Recognition Using Smartwatch Accelerator

by Seungyeon Kim and Jaehyun Yoo

Electronics 2026, 15(7), 1351; https://doi.org/10.3390/electronics15071351 - 24 Mar 2026

Viewed by 165

Abstract

Wearable device-based Human Activity Recognition (HAR) is widely used in health management, rehabilitation, and personal safety. While contemporary HAR research effectively classifies a wide range of discrete activities, there remains a significant gap in organizing these heterogeneous motions into a structured intensity framework [...] Read more.

Wearable device-based Human Activity Recognition (HAR) is widely used in health management, rehabilitation, and personal safety. While contemporary HAR research effectively classifies a wide range of discrete activities, there remains a significant gap in organizing these heterogeneous motions into a structured intensity framework suitable for continuous risk assessment. Furthermore, many high-performing models rely on computationally intensive architectures that hinder real-time deployment on resource-constrained wearables. We propose an on-device method for estimating five-level activity intensity in real time using only accelerometer signals from a commercial smartwatch. To bridge the gap between simple identification and intensity modeling, 13 dynamic and emergency-like wrist motions were integrated with 11 daily activities from the PAMAP2 dataset, yielding 21 activities mapped onto an ordinal five-level intensity scale. A finetuned Multi-Layer Perceptron (MLP) classifier trained on this integrated dataset achieved 0.939 accuracy and a quadratic weighted kappa (QWK) of 0.971. The model was deployed on a Galaxy Watch 7, achieving

< 1

ms inference latency and a size

< 0.1

MB, confirming real-time feasibility. This approach demonstrates that organizing diverse activities into a lightweight, intensity-aware framework provides a robust foundation for safety-aware monitoring systems under real-world, on-device constraints. Full article

(This article belongs to the Special Issue Wearable Sensors for Human Position, Attitude and Motion Tracking)

► Show Figures

Figure 1

24 pages, 5930 KB

Open AccessArticle

Style-Abstraction-Based Data Augmentation for Robust Affective Computing

by Xu Qiu, Taewan Kim and Bongjae Kim

Appl. Sci. 2026, 16(6), 3109; https://doi.org/10.3390/app16063109 - 23 Mar 2026

Viewed by 305

Abstract

Personality recognition and emotion recognition, two core tasks within affective computing, are fundamentally constrained by data scarcity as collecting and annotating human behavioral data is expensive and restricted by privacy concerns. Under these limited data conditions, existing models tend to rely on superficial [...] Read more.

Personality recognition and emotion recognition, two core tasks within affective computing, are fundamentally constrained by data scarcity as collecting and annotating human behavioral data is expensive and restricted by privacy concerns. Under these limited data conditions, existing models tend to rely on superficial shortcut features such as background appearance, lighting conditions, or color variations, rather than behavior-relevant cues including facial expressions, posture, and motion dynamics. To address this issue, we propose Style-Abstraction-based Data Augmentation, a style transfer-based augmentation strategy that reduces dependency on low-level appearance information while preserving high-level semantic cues. Specifically, we employ cartoonization to generate stylized variants of training videos that retain expressive characteristics but remove stylistic bias. We validate our approach on three diverse personality benchmarks (First Impression v2, UDIVA v0.5, and KETI) and emotion benchmark(Emotion Dataset) using state-of-the-art models including ViViT (Video Vision Transformer), TimeSformer, and VST (Video Swin Transformer). Our experiments indicate that increasing the proportion of style-abstracted data in the training set can improve performance on the evaluated datasets. Notably, our method yields consistent gains across all benchmarks: a 0.0893 reduction in MSE on UDIVA v0.5 (with VST), a 0.0023 improvement in 1-MAE on KETI (with TimeSformer), and a 0.0051 improvement on First Impression v2 (with TimeSformer). Furthermore, extending style-abstraction-based data augmentation to a four-class categorical emotion recognition task demonstrates similar performance gains, achieving up to a 3.44% accuracy increase with the TimeSformer backbone. These findings verify that our style-abstraction-based data augmentation facilitates learning of behavior-relevant features by reducing reliance on superficial shortcuts. Overall, cartoonization-based style abstraction for data augmentation functions as both an effective augmentation strategy and a regularization mechanism, encouraging the model to learn more stable and generalizable representations for affective computing applications. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

23 pages, 3219 KB

Open AccessArticle

Hybrid Data Curation for Imitation Learning with Physics- Generated Trajectories

by Mincheol Lee, Deun-Sol Cho and Won-Tae Kim

Appl. Sci. 2026, 16(6), 2968; https://doi.org/10.3390/app16062968 - 19 Mar 2026

Viewed by 349

Abstract

Robotic manipulators were initially introduced to replace repetitive human labor and have since evolved to perform complex tasks in dynamic environments. In such systems, imitation learning and reinforcement learning models capable of real-time trajectory generation are widely applied. Among these approaches, imitation learning [...] Read more.

Robotic manipulators were initially introduced to replace repetitive human labor and have since evolved to perform complex tasks in dynamic environments. In such systems, imitation learning and reinforcement learning models capable of real-time trajectory generation are widely applied. Among these approaches, imitation learning enables rapid training when high-quality datasets are available. However, it suffers from high costs associated with collecting expert demonstration data and significant performance variability depending on data quality. Recently, learning approaches utilizing large-scale datasets have been explored, but they often struggle to guarantee reliable performance in tasks requiring precise control and incur substantial computational costs for model construction, limiting their applicability as a general-purpose learning strategy. To address these limitations, this paper proposes an imitation learning framework that integrates sampling-based motion planning with a hybrid data curation strategy. The proposed method employs a sampling-based planner (e.g., RRT*) to generate diverse physically feasible trajectories, thereby reducing the cost of acquiring expert demonstration data. The generated trajectories are then curated through clustering-based grouping and rule-based filtering to select high-quality training samples from large-scale datasets. The proposed framework automatically generates physically feasible trajectories while selecting high-quality data from large trajectory pools, thereby improving training stability and reducing data-related costs. Experimental results demonstrate that the proposed method achieves an average success rate of 79.1% (95% CI: 74.3–83.2%) and produces trajectories with shorter trajectories, lower final distances, and reduced joint movements compared to conventional filtering methods. Full article

(This article belongs to the Special Issue Digital Twin and IoT, 2nd Edition)

► Show Figures

Figure 1

30 pages, 4114 KB

Open AccessArticle

TricP: A Novel Approach for Human Activity Recognition Using Tricky Predator Optimization Based on Inception and LSTM

by Palak Girdhar, Muslem Al-Saidi, Prashant Johri, Deepali Virmani, Hussein Taha and Oday Ali Hassen

Telecom 2026, 7(2), 32; https://doi.org/10.3390/telecom7020032 - 19 Mar 2026

Viewed by 288

Abstract

Human Activity Recognition (HAR) is a pivotal research area for applications such as automated surveillance, smart homes, security, healthcare, and human behavior analysis. Traditional machine-learning approaches often rely on manual feature engineering, which can limit generalization. Although deep learning has improved HAR through [...] Read more.

Human Activity Recognition (HAR) is a pivotal research area for applications such as automated surveillance, smart homes, security, healthcare, and human behavior analysis. Traditional machine-learning approaches often rely on manual feature engineering, which can limit generalization. Although deep learning has improved HAR through automatic representation learning, achieving high detection performance under computational constraints remains challenging. This paper proposes an efficient HAR framework that combines deep learning with hybrid optimization. Surveillance videos are first decomposed into frames, and a keyframe selection stage identifies distinctive frames to reduce redundancy and computational cost while preserving informative content. Motion and appearance features are then extracted using Histogram of Oriented Optical Flow (HOOF) and a ResNet-101 model, respectively, and concatenated into a unified feature representation. Classification is performed using an Inception-based Long Short-Term Memory (Incept-LSTM) network, which is fine-tuned via the proposed Tricky Predator Optimization (TricP) over a restricted, low-dimensional parameter vector. TricP is inspired by predator poaching behavior and the social dynamics of Latrans to enhance exploration and exploitation during search. Experiments on the UCF-Crime dataset show that the proposed method achieves 96.84% specificity, 92.16% sensitivity, and 93.62% accuracy. Full article

► Show Figures

Figure 1

25 pages, 649 KB

Open AccessArticle

A Multimodal Biomedical Sensing Approach for Muscle Activation Onset Detection

by Qiang Chen, Haofei Li, Zhe Xiang, Moxian Lin, Yinfei Yi, Haoran Tang and Yan Zhan

Sensors 2026, 26(6), 1907; https://doi.org/10.3390/s26061907 - 18 Mar 2026

Viewed by 208

Abstract

Muscle onset detection is a fundamental problem in electromyography signal analysis, human–machine interaction, and rehabilitation assessment. In medical and biomedical applications, slow muscle activation onset processes are widely encountered in scenarios such as rehabilitation training, postural regulation, and fine motor control. Such processes [...] Read more.

Muscle onset detection is a fundamental problem in electromyography signal analysis, human–machine interaction, and rehabilitation assessment. In medical and biomedical applications, slow muscle activation onset processes are widely encountered in scenarios such as rehabilitation training, postural regulation, and fine motor control. Such processes are typically characterized by slowly varying amplitudes, long temporal durations, and high susceptibility to noise interference, which poses significant challenges for accurate identification of onset timing. To address these issues, a lightweight temporal attention method for slow muscle activation onset detection is proposed and systematically validated under multimodal experimental settings. The proposed method takes surface electromyography signals as the primary input, while synchronously acquired optical motion image data are incorporated into the experimental design and result analysis, thereby aligning with the common joint use of optical imaging and physiological signals in medical and biomedical research. From a methodological perspective, the proposed framework is composed of lightweight temporal feature encoding, a slow activation-aware temporal attention mechanism, and noise suppression with stable decision strategies. Under the constraint of low computational complexity, the ability to model progressive activation signals is effectively enhanced. Experiments are conducted on a dataset containing multiple types of slow activation movements, and model performance is evaluated using five-fold cross-validation. The results demonstrate that under regular signal-to-noise ratio conditions, the proposed method significantly outperforms traditional threshold-based approaches, classical machine learning models, and several deep learning baselines in terms of onset detection accuracy, recall, and precision. Specifically, onset detection accuracy reaches approximately

92 %

, recall is around

90 %

, and precision is approximately

93 %

. Meanwhile, the average onset detection error and detection delay are reduced to about

41 ms

and

28 ms

, respectively, with the false positive rate controlled at approximately

2.2 %

. Stable performance is further maintained under different noise levels and cross-subject settings, indicating strong robustness and generalization capability. Full article

(This article belongs to the Special Issue Application of Optical Imaging in Medical and Biomedical Research)

► Show Figures

Figure 1

22 pages, 7355 KB

Open AccessArticle

IAE-Net: Incremental Learning-Based Attention-Enhanced DenseNet for Robust Facial Emotion Recognition

by Haseeb Ali Khan and Jong-Ha Lee

Mathematics 2026, 14(6), 1023; https://doi.org/10.3390/math14061023 - 18 Mar 2026

Viewed by 221

Abstract

Facial emotion recognition (FER) is an important component of human–computer interaction and healthcare-oriented affective computing. However, reliable deployment remains difficult in unconstrained settings due to appearance and geometric variability (e.g., pose, illumination, and occlusion), demographic imbalance, and dataset bias. In practice, two additional [...] Read more.

Facial emotion recognition (FER) is an important component of human–computer interaction and healthcare-oriented affective computing. However, reliable deployment remains difficult in unconstrained settings due to appearance and geometric variability (e.g., pose, illumination, and occlusion), demographic imbalance, and dataset bias. In practice, two additional constraints frequently limit real-world FER systems: the computational overhead of heavy architectures and limited adaptability when data evolve over time, where sequential updates can cause catastrophic forgetting. To address these challenges, we propose the Incremental Attention-Enhanced Network (IAE-Net), a compact single-branch framework built on a DenseNet121 backbone and a cascaded refinement pipeline. The model incorporates Channel Attention (CA) to emphasize expression-relevant feature channels and suppress less informative responses, followed by a deformable attention module (DA) that reduces feature misalignment caused by non-rigid facial motion and pose shifts, thereby improving robustness under geometric variability. For continual deployment, IAE-Net supports class-incremental updates via weight transfer, exemplar replay, and knowledge distillation to improve retention during sequential learning. We evaluate IAE-Net on four widely used benchmarks, FER2013, FERPlus, KDEF, and AffectNet, covering both controlled and in-the-wild conditions under a unified training protocol. The proposed approach achieves accuracies of 79.15%, 92.03%, 99.48%, and 74.20% on FER2013, FERPlus, KDEF, and AffectNet, respectively, with balanced precision, recall, and F1-score trends. These results indicate that IAE-Net provides an efficient and extensible FER framework with potential utility in dynamic real-world and longitudinal healthcare-oriented applications. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Artificial Neural Networks)

► Show Figures

Figure 1

22 pages, 1747 KB

Open AccessReview

Talking Head Generation Through Generative Models and Cross-Modal Synthesis Techniques

by Hira Nisar, Salman Masood, Zaki Malik and Adnan Abid

J. Imaging 2026, 12(3), 119; https://doi.org/10.3390/jimaging12030119 - 10 Mar 2026

Viewed by 590

Abstract

Talking Head Generation (THG) is a rapidly advancing field at the intersection of computer vision, deep learning, and speech synthesis, enabling the creation of animated human-like heads that can produce speech and express emotions with high visual realism. The core objective of THG [...] Read more.

Talking Head Generation (THG) is a rapidly advancing field at the intersection of computer vision, deep learning, and speech synthesis, enabling the creation of animated human-like heads that can produce speech and express emotions with high visual realism. The core objective of THG systems is to synthesize coherent and natural audio–visual outputs by modeling the intricate relationship between speech signals, facial dynamics, and emotional cues. These systems find widespread applications in virtual assistants, interactive avatars, video dubbing for multilingual content, educational technologies, and immersive virtual and augmented reality environments. Moreover, the development of THG has significant implications for accessibility technologies, cultural preservation, and remote healthcare interfaces. This survey paper presents a comprehensive and systematic overview of the technological landscape of Talking Head Generation. We begin by outlining the foundational methodologies that underpin the synthesis process, including generative adversarial networks (GANs), motion-aware recurrent architectures, and attention-based models. A taxonomy is introduced to organize the diverse approaches based on the nature of input modalities and generation goals. We further examine the contributions of various domains such as computer vision, speech processing, and human–robot interaction, each of which plays a critical role in advancing the capabilities of THG systems. The paper also provides a detailed review of datasets used for training and evaluating THG models, highlighting their coverage, structure, and relevance. In parallel, we analyze widely adopted evaluation metrics, categorized by their focus on image quality, motion accuracy, synchronization, and semantic fidelity. Operating parameters such as latency, frame rate, resolution, and real-time capability are also discussed to assess deployment feasibility. Special emphasis is placed on the integration of generative artificial intelligence (GenAI), which has significantly enhanced the adaptability and realism of talking head systems through more powerful and generalizable learning frameworks. Full article

(This article belongs to the Special Issue AI-Driven Multimodal Image and Video Processing: Advances and Applications)

► Show Figures

Figure 1

49 pages, 5891 KB

Open AccessArticle

A Study on Autonomous Driving Motion Sickness from the Perspective of Multimodal Human Signals

by Su Young Kim and Yoon Sang Kim

Sensors 2026, 26(5), 1675; https://doi.org/10.3390/s26051675 - 6 Mar 2026

Viewed by 466

Abstract

In autonomous driving, motion sickness (MS) arises from physical or visual stimuli, or a combination of both. However, objective quantification of MS level (MSL) remains limited beyond questionnaire-based assessments. Using multimodal human signals (physiological and behavioral) collected in an autonomous driving simulator, this [...] Read more.

In autonomous driving, motion sickness (MS) arises from physical or visual stimuli, or a combination of both. However, objective quantification of MS level (MSL) remains limited beyond questionnaire-based assessments. Using multimodal human signals (physiological and behavioral) collected in an autonomous driving simulator, this study addresses the association between these signals and MSL, across these MS types, by (i) screening and curating a decade of human-signal MS studies (HS-Set) to establish a data-driven foundation for selecting target sensor domains and features, (ii) constructing a dataset with subjective measures of MSL (fast motion sickness scale and simulator sickness questionnaire (SSQ)), alongside human signals (electroencephalogram (EEG), photoplethysmogram (PPG), electrodermal activity (EDA), skin temperature, and head/eye movement), (iii) conducting a correlation analysis between MSL and the identified features from HS-Set, and (iv) quantifying multivariable contributions at the feature and sensor domains through an explainable boosting machine (EBM). Key correlations include head amplitude/energy (pitch/surge) with SSQ total/oculomotor, eye entropy with nausea/oculomotor (positive), and EDA with nausea (negative). The EBM-based contribution analysis highlights EEG connectivity and head kinematics as dominant contributors; excluding EEG, the interpretability of single-domain models remains limited. Additionally, a combination of Head, PPG, and EDA domains retains over 80% of the full model’s interpretability. Full article

(This article belongs to the Special Issue Sensors Network and Wearables for People Activities and Wellbeing Monitoring)

► Show Figures

Graphical abstract

15 pages, 2660 KB

Open AccessArticle

A Comparative Study of Lower-Limb Joint Angles and Moment Estimations Across Different Gait Conditions Using OpenSim for Body-Weight Offloading Applications

by Bushira Musa, Ji Chen, Glacia Martin, Kaitlin H. Lostroscio and Alexander Peebles

Biomechanics 2026, 6(1), 27; https://doi.org/10.3390/biomechanics6010027 - 3 Mar 2026

Viewed by 455

Abstract

Background: Microgravity exposure causes muscle atrophy and bone density loss in astronauts. Traditional motion analysis provides estimations of external kinematics and muscle activation, but cannot resolve internal load. OpenSim closes this gap by applying musculoskeletal modeling to estimate internal joint mechanics. Methods: In [...] Read more.

Background: Microgravity exposure causes muscle atrophy and bone density loss in astronauts. Traditional motion analysis provides estimations of external kinematics and muscle activation, but cannot resolve internal load. OpenSim closes this gap by applying musculoskeletal modeling to estimate internal joint mechanics. Methods: In this study, we aimed to develop an OpenSim workflow to estimate joint angles and moments using datasets from two publicly available gait studies: the Politecnico di Milano study (Dataset 1), which includes level-floor walking, walking on heels, walking on toes, and step-down-from-stairs tasks, and Maclean et al.’s walking study in reduced gravities (Dataset 2), which includes four simulated gravity levels (1.0 G, 0.76 G, 0.54 G, and 0.31 G). Marker and ground reaction force (GRF) data, along with participants’ mass, were used to prepare the first three steps of OpenSim’s workflow, including scaling, inverse kinematics (IK), and inverse dynamics (ID). Scripts using MATLAB R2025a (The MathWorks, Inc., Natick, MA, USA) were created to store, normalize, and compare OpenSim outputs with reference data on the right leg. Pearson’s correlation coefficient (PCC) was used to quantify agreement between OpenSim-derived joint angles and moments and the reference data, and root mean square error (RMSE) was used to characterize accuracy. Results: Hip and knee angles showed excellent correlation across both datasets (PCC > 0.974). Ankle angles were more variable, particularly in Dataset 1 (PCC = 0.833; RMSE = 19.797°) compared to Dataset 2 (PCC = 0.995; RMSE = 8.73°). Joint moment correlations were strong for hip and knee (PCC > 0.85), though ankle moments in Dataset 1 exhibited lower correlation (PCC = 0.677) and higher error (0.30 Nm/kg) compared to the high accuracy observed across all joints in Dataset 2. Discussion: We speculate that the lower PCC values and higher RMSE observed for ankle dorsi/plantar flexion angle and moment in Dataset 1 are mainly attributable to differences in shank segment frame definitions between the OpenSim model and the human body model used in Dataset 1. Higher ankle angle RMSEs in Dataset 2 may be due to lower weights assigned to ankle markers in the scaling and IK setup files, resulting in different ankle joint center definitions. Conclusion: In the future, we plan to improve this OpenSim workflow by including additional participants and datasets collected in simulated reduced-gravity environments and by implementing a residual reduction algorithm (RRA) and computed muscle control (CMC) to enable muscle activation estimation. Full article

(This article belongs to the Topic The Mechanics of Movement: Biomechanics in Sports Performance)

► Show Figures

Figure 1

Search Results (458)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (458)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI