Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (234)

Search Parameters:
Keywords = hand motion capture

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 4888 KB  
Article
Kinematic and Muscle Activation Differences Between High-Performance and Intermediate Tennis Players During the Forehand Drive
by Bruno Pedro, Silvia Cabral, Filipa João, Andy Man Kit Lei and António P. Veloso
Sensors 2026, 26(7), 2244; https://doi.org/10.3390/s26072244 - 4 Apr 2026
Viewed by 158
Abstract
This study compared the kinematic and neuromuscular characteristics of the tennis forehand drive between high-performance (HP) and intermediate (INT) players. Eighteen right-handed male players (HP: n = 9; INT: n = 9) performed cross-court forehands while three-dimensional motion capture and surface electromyography (EMG) [...] Read more.
This study compared the kinematic and neuromuscular characteristics of the tennis forehand drive between high-performance (HP) and intermediate (INT) players. Eighteen right-handed male players (HP: n = 9; INT: n = 9) performed cross-court forehands while three-dimensional motion capture and surface electromyography (EMG) were recorded from the dominant upper limb and trunk. Kinematic and EMG data were time-normalized to the forward swing. One-dimensional statistical parametric mapping two-sample t-tests were used to compare joint angles, angular and linear velocities, and EMG amplitude waveforms between groups. Bonferroni-corrected significance levels were set at α = 0.0017 for kinematic variables and α = 0.0063 for EMG data. HP players exhibited greater racket linear velocity during the final part of the forward swing, accompanied by higher shoulder, elbow and wrist linear velocities, whereas hip linear velocity did not differ between groups. Joint angles were broadly similar, with SPM revealing only slightly greater early knee flexion in HP players. In contrast, HP players showed higher hip and knee angular velocities and greater wrist angular velocities in both flexion/extension and radial/ulnar deviation towards impact. EMG patterns were generally comparable, but HP players displayed higher biceps brachii activation in two significant clusters during the mid-to-late forward swing and greater triceps brachii activation in the late forward swing. No significant differences were observed for deltoid, pectoralis major, latissimus dorsi, flexor carpi radialis or extensor carpi radialis. These findings indicate that superior forehand performance in HP players is associated primarily with refined segmental coordination, greater lower-limb and distal segment velocities, and locally increased elbow muscle activation, rather than with widespread increases in upper-limb or trunk muscle activity. Full article
(This article belongs to the Special Issue Movement Biomechanics Applications of Wearable Inertial Sensors)
Show Figures

Figure 1

19 pages, 759 KB  
Article
Dual-Stream BiLSTM–Transformer Architecture for Real-Time Two-Handed Dynamic Sign Language Gesture Recognition
by Enachi Andrei, Turcu Corneliu-Octavian, Culea George, Andrioaia Dragos-Alexandru, Ungureanu Andrei-Gabriel and Sghera Bogdan-Constantin
Appl. Sci. 2026, 16(6), 2912; https://doi.org/10.3390/app16062912 - 18 Mar 2026
Viewed by 201
Abstract
Two-handed dynamic gesture recognition represents a fundamental component of sign language interpretation involving the modeling of temporal dependencies and inter-hand coordination. In this task, a major challenge is modeling asymmetric motion patterns, as well as bidirectional and long-range temporal dependencies. Most existing frameworks [...] Read more.
Two-handed dynamic gesture recognition represents a fundamental component of sign language interpretation involving the modeling of temporal dependencies and inter-hand coordination. In this task, a major challenge is modeling asymmetric motion patterns, as well as bidirectional and long-range temporal dependencies. Most existing frameworks rely on early fusion strategies that merge joints, keypoints or landmarks from both hands in early processing stages, primarily to reduce model complexity and enforce a unified representation. In this work, a novel dual-stream BiLSTM–Transformer model architecture is proposed for two-handed dynamic sign language recognition, where parallel encoders process the trajectories of each hand independently. To capture spatial and temporal dependencies for each hand, an attention-based cross-hand fusion mechanism is employed, with hand landmarks extracted by the MediaPipe Hands framework as a preprocessing step to enable real-time CPU-based inference. Experimental evaluation conducted on custom Romanian Sign Language dynamic gesture datasets indicates that the proposed dual-stream-based system outperforms single-handed baselines, achieving improvements in high recognition accuracy for asymmetric gestures and consistent performance gains for synchronized two-handed gestures. The proposed architecture represents an efficient and lightweight solution suitable for real-time sign language recognition and interpretation. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 1959 KB  
Article
Predictive and Reactive Control During Interception
by Mario Treviño, Nathaly Martín, Andrea Barrera and Inmaculada Márquez
Brain Sci. 2026, 16(3), 322; https://doi.org/10.3390/brainsci16030322 - 18 Mar 2026
Viewed by 304
Abstract
Background/Objectives: Successful interception of moving targets requires combining predictive control, which anticipates future target states, and reactive control, which compensates for ongoing sensory discrepancies. How these components evolve over time and are distributed across gaze and manual behavior remains unclear. We aimed to [...] Read more.
Background/Objectives: Successful interception of moving targets requires combining predictive control, which anticipates future target states, and reactive control, which compensates for ongoing sensory discrepancies. How these components evolve over time and are distributed across gaze and manual behavior remains unclear. We aimed to explore the time-resolved dynamics of predictive control during continuous interception and to dissociate eye and hand contributions. Methods: Human participants intercepted a moving target in a two-dimensional arena using a joystick while eye movements were recorded. Target speed was systematically varied, and visual information was selectively reduced by occluding either the target or the user-controlled cursor. Predictive control was assessed using two complementary metrics: a geometric strategy index capturing moment-to-moment spatial lead or lag relative to target motion, applied separately to gaze and manual trajectories, and root mean square error (RMSE) computed relative to current and forward-shifted target positions to quantify predictive alignment. Results: Successful interception was characterized by structured, speed-dependent transitions between predictive and reactive control rather than a fixed strategy. Predictive alignment emerged early and was dynamically reweighted as temporal constraints increased. Gaze and manual behavior showed complementary but partially dissociable predictive signatures. Occluding the target decreased predictive alignment, whereas occluding the user-controlled cursor had comparatively minor effects, indicating strong reliance on internal state estimation rather than continuous visual feedback of the effector. Conclusions: Predictive and reactive control are continuously and dynamically reweighted during interception. Their interaction unfolds within single trials and depends on target dynamics and sensory availability. These findings provide quantitative evidence for time-resolved coordination between anticipatory and feedback-driven control mechanisms in goal-directed behavior. Full article
(This article belongs to the Special Issue Predictive Processing in Brain and Behavior)
Show Figures

Figure 1

14 pages, 4736 KB  
Article
Unsupervised Dynamic Time Warping Clustering for Robust Functional Network Identification in fNIRS Motor Tasks
by Murad Althobaiti
Sensors 2026, 26(6), 1848; https://doi.org/10.3390/s26061848 - 15 Mar 2026
Viewed by 304
Abstract
Functional near-infrared spectroscopy (fNIRS) is a valuable non-invasive modality for brain-computer interfaces (BCIs), but robust signal interpretation is challenged by the significant temporal variability of the hemodynamic response. Standard linear methods, such as Pearson correlation, often fail to capture functional connectivity when signals [...] Read more.
Functional near-infrared spectroscopy (fNIRS) is a valuable non-invasive modality for brain-computer interfaces (BCIs), but robust signal interpretation is challenged by the significant temporal variability of the hemodynamic response. Standard linear methods, such as Pearson correlation, often fail to capture functional connectivity when signals exhibit temporal jitter. This study validates an unsupervised Dynamic Time Warping (DTW) clustering framework to robustly identify motor networks from fNIRS data by accommodating non-linear temporal shifts. We analyzed a public fNIRS dataset (N = 30) across right-hand (RHT), left-hand (LHT), and foot tapping (FT) tasks. A robust preprocessing pipeline was implemented, including Wavelet Motion Correction and Common Average Referencing (CAR) to remove artifacts and global systemic noise. The core method involved computing Z-score normalized DTW distance matrices, followed by hierarchical clustering. To validate the framework, we benchmarked it against a standard Pearson Correlation method. Results show that the unsupervised DTW framework achieved a network identification accuracy of 53.17%, significantly outperforming the standard Pearson correlation benchmark (48.06%) with a statistically significant difference (p < 0.05). The framework successfully detected distinct, somatotopically correct modulations: superior-medial activation during foot tapping and lateralized activation during hand tapping. These findings demonstrate that unsupervised DTW clustering is a robust, data-driven approach that outperforms conventional linear methods in capturing functional networks during motor tasks, showing significant potential for next-generation asynchronous BCIs. Full article
(This article belongs to the Special Issue Advanced Sensor Technologies for Neuroimaging and Neurorehabilitation)
Show Figures

Figure 1

22 pages, 10242 KB  
Article
Cross-Modality Whole-Heart MRI Reconstruction with Deep Motion Correction and Super-Resolution
by Jinwei Dong, Wenhao Ke, Wangbin Ding, Liqin Huang and Mingjing Yang
Sensors 2026, 26(5), 1565; https://doi.org/10.3390/s26051565 - 2 Mar 2026
Viewed by 372
Abstract
Magnetic resonance imaging (MRI) inherently suffers from motion artifacts and inter-slice misalignment, primarily due to sequential slice acquisition and the prolonged scanning time required for dynamic cardiac motion. These acquisition-induced inconsistencies often lead to anatomically implausible representations of cardiac structures, impairing subsequent clinical [...] Read more.
Magnetic resonance imaging (MRI) inherently suffers from motion artifacts and inter-slice misalignment, primarily due to sequential slice acquisition and the prolonged scanning time required for dynamic cardiac motion. These acquisition-induced inconsistencies often lead to anatomically implausible representations of cardiac structures, impairing subsequent clinical analyses such as 3D reconstruction and regional functional assessment. On the other hand, acquiring high-resolution MRI demands extended scan durations that increase patient burden and potential health risks. To address this challenge, we propose a deep motion correction and super-resolution whole-heart reconstruction (DeepWHR) framework. It learns cardiac structure prior knowledge from computed tomography (CT) data, and transfers it to reconstruct cardiac structure from conventional misaligned and large slice thickness MRI images. Specifically, DeepWHR utilizes CT anatomy data to train a deep motion correction model that enables the network to capture structurally coherent and anatomically consistent representations, while MRI Finetune preserves modality-specific spatial characteristics, ensuring that the reconstructed results retain the intrinsic MRI data distribution. Furthermore, DeepWHR introduced an implicit neural representation module, which models continuous spatial fields, enabling multi-scale super-resolution structure reconstruction. Experiments on the CARE2024 WHS dataset validate that our method not only restores the spatial coherence of MRI-derived anatomical structures but also generates high-fidelity label representations suitable for downstream cardiac applications. This study demonstrates that DeepWHR transforms sparse, misaligned 2D label stacks into anatomically coherent, high-resolution 3D models, enhancing their reliability for clinical applications. Full article
(This article belongs to the Special Issue Emerging MRI Techniques for Enhanced Disease Diagnosis and Monitoring)
Show Figures

Figure 1

7 pages, 5296 KB  
Proceeding Paper
Multi-Step Action Recognition for Long-Term Care Using Temporal Convolutional Network–Dynamic Time Warping–Finite State Machine and MediaPipe
by Feng-Jung Liu, Mei-Jou Lu and Min Chao
Eng. Proc. 2026, 129(1), 21; https://doi.org/10.3390/engproc2026129021 - 28 Feb 2026
Viewed by 254
Abstract
An intelligent multi-step action recognition system was designed for long-term caregiver training and assessment. Leveraging MediaPipe for precise and real-time human pose estimation, the system extracts detailed spatiotemporal body and hand keypoints. Temporal convolutional networks are employed to effectively capture temporal dependencies and [...] Read more.
An intelligent multi-step action recognition system was designed for long-term caregiver training and assessment. Leveraging MediaPipe for precise and real-time human pose estimation, the system extracts detailed spatiotemporal body and hand keypoints. Temporal convolutional networks are employed to effectively capture temporal dependencies and complex features from sequential motion data. Dynamic time warping provides robust sequence alignment, allowing flexible comparison between performed actions and standard templates despite temporal variations in execution speed or style. A finite state machine imposes logical constraints by modeling expected action step sequences, enabling accurate detection of sequence anomalies or deviations. This hybrid architecture supports comprehensive evaluation and real-time feedback, facilitating improved caregiver skill acquisition, process adherence, and quality control within long-term care settings. The system aims to advance digital transformation in healthcare education by providing a scalable, precise, and adaptive training solution. Full article
Show Figures

Figure 1

17 pages, 980 KB  
Article
Dual-View Sign Language Recognition via Front-View Guided Feature Fusion for Automatic Sign Language Training
by Siyuan Jing and Gaorong Yan
Information 2026, 17(2), 158; https://doi.org/10.3390/info17020158 - 5 Feb 2026
Viewed by 448
Abstract
The foundation of an automatic sign language training (ASLT) system lies in word-level sign language recognition (WSLR), which refers to the translation of captured sign language signals into sign words. However, two key issues need to be addressed in this field: (1) the [...] Read more.
The foundation of an automatic sign language training (ASLT) system lies in word-level sign language recognition (WSLR), which refers to the translation of captured sign language signals into sign words. However, two key issues need to be addressed in this field: (1) the number of sign words in all public sign language datasets is too small, and the words do not match real-world scenarios, and (2) only single-view sign videos are typically provided, which makes solving the problem of hand occlusion difficult. In this work, we design an efficient algorithm for WSLR which is trained on our recently released NationalCSL-DP dataset. The algorithm first performs frame-level alignment of dual-view sign videos. A two-stage deep neural network is then employed to extract the spatiotemporal features of the signers, including hand motions and body gestures. Furthermore, a front-view guided early fusion (FvGEF) strategy is proposed for effective fusion of features from different views. Extensive experiments were carried out to evaluate the algorithm. The results show that the proposed algorithm significantly outperformed existing dual-view sign language recognition algorithms. Compared with several state-of-the-art methods, the proposed algorithm achieves Top-1 accuracy on the NationalCSL6707 dataset that is 10.29 and 11.38 higher than MViT and CNN + Transformer, respectively. Full article
Show Figures

Graphical abstract

15 pages, 5971 KB  
Article
A Resource-Efficient Method for Real-Time Flexion–Extension Angle Estimation with an Under-Sensorized Finger Exoskeleton
by Alessia Di Natale, Matilde Gelli, Gherardo Liverani, Alessandro Ridolfi, Benedetto Allotta and Nicola Secciani
Appl. Sci. 2026, 16(3), 1575; https://doi.org/10.3390/app16031575 - 4 Feb 2026
Viewed by 388
Abstract
Hand exoskeletons are used in rehabilitation together with serious games to enhance patient experience and, possibly, therapy outcomes. To achieve good engagement, a realistic virtual representation of hand motion is needed; however, the relationship between exoskeleton joint motion and anatomical finger kinematics is [...] Read more.
Hand exoskeletons are used in rehabilitation together with serious games to enhance patient experience and, possibly, therapy outcomes. To achieve good engagement, a realistic virtual representation of hand motion is needed; however, the relationship between exoskeleton joint motion and anatomical finger kinematics is rarely obtained using low-cost procedures. This work introduces a mechanical redesign and modeling pipeline that utilizes temporary sensors to identify the exoskeleton–finger mapping, enabling qualitatively realistic virtual hand motion driven solely by the existing on-board sensor. A recently developed hand exoskeleton prototype was redesigned to host two temporary rotary encoders aligned with the MetaCarpoPhalangeal (MCP) and Proximal InterPhalangeal (PIP) joints, in addition to the actuation encoder. Healthy subjects wore the modified device and performed full flexion–extension cycles. Encoder trajectories were processed; then each cycle was approximated by a third-order polynomial in the normalized actuation angle, and a group-level model was obtained by averaging coefficients across valid cycles. Finally, the encoder-based reconstructions of MCP and PIP motion were evaluated against measurements from a gold-standard optical motion capture system. Results indicate that the proposed polynomial model enables joint-angle estimation with sufficient accuracy for interactive rehabilitation scenarios, supporting its use to drive smooth virtual hand motion from the on-board exoskeleton encoder alone. Full article
(This article belongs to the Special Issue Latest Advances and Prospects of Human-Robot Interaction (HRI))
Show Figures

Figure 1

17 pages, 5916 KB  
Article
Three-Dimensional Shape Estimation of a Soft Finger Considering Contact States
by Naoyuki Matsuyama, Weiwei Wan and Kensuke Harada
Appl. Sci. 2026, 16(2), 717; https://doi.org/10.3390/app16020717 - 9 Jan 2026
Viewed by 385
Abstract
To achieve precise in-hand manipulation and feedback control using soft robotic fingers, it is essential to accurately measure their deformable structures. In particular, estimating the three-dimensional shape of a soft finger under contact conditions is a critical challenge, as the deformation state directly [...] Read more.
To achieve precise in-hand manipulation and feedback control using soft robotic fingers, it is essential to accurately measure their deformable structures. In particular, estimating the three-dimensional shape of a soft finger under contact conditions is a critical challenge, as the deformation state directly affects manipulation reliability. However, nonlinear deformations and occlusions arising from interactions with external objects make the estimation difficult. To address these issues, we propose a soft finger structure that integrates small magnets and magnetic sensors inside the body, enabling the acquisition of rich deformation information in both contact and non-contact states. The design provides a 15-dimensional time-series signal composed of motor angles, motor currents, and magnetic sensor outputs as inputs for shape estimation. Built on the sensing signals, we propose a mode-selection-based learning approach that outputs multiple candidate shapes and selects the correct one. The proposed network predicts the three-dimensional positions of four external markers attached to the finger, which serve as a proxy representation of the finger’s shape. The network is trained in a supervised manner using ground-truth marker positions measured by a motion capture system. The experimental results under both contact and non-contact conditions demonstrate that the proposed method achieves an average estimation error of approximately 4 mm, outperforming conventional one-shot regression models that output coordinates directly. The integration of magnetic sensing is demonstrated to be able to enable accurate recognition of contact states and significantly improve stability in shape estimation. Full article
Show Figures

Figure 1

11 pages, 4787 KB  
Article
Vision-Based Hand Function Evaluation with Soft Robotic Rehabilitation Glove
by Mukun Tong, Michael Cheung, Yixing Lei, Mauricio Villarroel and Liang He
Sensors 2026, 26(1), 138; https://doi.org/10.3390/s26010138 - 25 Dec 2025
Viewed by 773
Abstract
Advances in robotic technology for hand rehabilitation, particularly soft robotic gloves, have significant potential to improve patient outcomes. While vision-based algorithms pave the way for fast and convenient hand pose estimation, most current models struggle to accurately track hand movements when soft robotic [...] Read more.
Advances in robotic technology for hand rehabilitation, particularly soft robotic gloves, have significant potential to improve patient outcomes. While vision-based algorithms pave the way for fast and convenient hand pose estimation, most current models struggle to accurately track hand movements when soft robotic gloves are used, primarily due to severe occlusion. This limitation reduces the applicability of soft robotic gloves in digital and remote rehabilitation assessment. Furthermore, traditional clinical assessments like the Fugl-Meyer Assessment (FMA) rely on manual measurements and subjective scoring scales, lacking the efficiency and quantitative accuracy needed to monitor hand function recovery in data-driven personalised rehabilitation. Consequently, few integrated evaluation systems provide reliable quantitative assessments. In this work, we propose an RGB-based evaluation system for soft robotic glove applications, which is aimed at bridging these gaps in assessing hand function. By incorporating the Hand Mesh Reconstruction (HaMeR) model fine-tuned with motion capture data, our hand estimation framework overcomes occlusion and enables accurate continuous tracking of hand movements with reduced errors. The resulting functional metrics include conventional clinical benchmarks such as the mean per joint angle error (MPJAE) and range of motion (ROM), providing quantitative, consistent measures of rehabilitation progress and achieving tracking errors lower than 10°. In addition, we introduce adapted benchmarks such as the angle percentage of correct keypoints (APCK), mean per joint angular velocity error (MPJAVE) and angular spectral arc length (SPARC) error to characterise movement stability and smoothness. This extensible and adaptable solution demonstrates the potential of vision-based systems for future clinical and home-based rehabilitation assessment. Full article
(This article belongs to the Special Issue Flexible Sensing in Robotics, Healthcare, and Beyond)
Show Figures

Figure 1

23 pages, 65396 KB  
Article
Comparative Analysis of the Accuracy and Robustness of the Leap Motion Controller 2
by Daniel Matuszczyk, Mikel Jedrusiak, Denis Fisseler and Frank Weichert
Sensors 2025, 25(24), 7473; https://doi.org/10.3390/s25247473 - 8 Dec 2025
Viewed by 1070
Abstract
Along with the ongoing success of virtual/augmented reality (VR/AR) and human–machine interaction (HMI) in the professional and consumer markets, new compatible and inexpensive hand tracking devices are required. One of the contenders in this market is the Leap Motion Controller 2 (LMC2), successor [...] Read more.
Along with the ongoing success of virtual/augmented reality (VR/AR) and human–machine interaction (HMI) in the professional and consumer markets, new compatible and inexpensive hand tracking devices are required. One of the contenders in this market is the Leap Motion Controller 2 (LMC2), successor to the popular Leap Motion Controller (LMC1), which has been widely used for scientific hand-tracking applications since its introduction in 2013. To quantify ten years of advances, this study compares both controllers using quantitative tracking metrics and characterizes the interaction space above the sensor. A robot-actuated 3D-printed hand and a motion-capture system provide controlled movements and external reference data. In the central tracking volume, the LMC2 achieves improved performance, reducing palm-position error from 7.9–9.8 mm (LMC1) to 5.2–5.3 mm (LMC2) and lowering positional variability from 1.3–2.2 mm to 0.4–0.8 mm. Dynamic tests confirm stable tracking for both devices. For boundary experiments, the LMC2 maintains continuous detection at distances up to 666 mm, compared to 250–275 mm (LMC1), and detects hands entering the field of view from distances up to 646 mm. Both devices show reduced accuracy toward the edges of the tracking volume. Overall, the results provide a grounded characterization of LMC2 performance in its newly emphasized VR/AR-relevant interaction spaces, while the metrics support cross-comparison with earlier LMC1-based studies and transfer to related application scenarios. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Graphical abstract

32 pages, 6322 KB  
Article
Development of a Robotic Manipulator for Piano Performance via Numbered Musical Notation Recognition
by Pu-Sheng Tsai, Ter-Feng Wu and Chen-Ting Liao
Machines 2025, 13(12), 1121; https://doi.org/10.3390/machines13121121 - 5 Dec 2025
Viewed by 821
Abstract
This paper presents a piano-playing robotic system that integrates numbered musical notation recognition with automated manipulator control. The system captures the notation using a camera, applies four-point detection for perspective correction, and performs measure segmentation through an orthogonal projection method. A pixel-scanning technique [...] Read more.
This paper presents a piano-playing robotic system that integrates numbered musical notation recognition with automated manipulator control. The system captures the notation using a camera, applies four-point detection for perspective correction, and performs measure segmentation through an orthogonal projection method. A pixel-scanning technique is then used to locate the positions of numerical notes, pitch dots, and rhythmic markers. Digit recognition is achieved using a CNN model trained on both the MNIST handwritten digit dataset and a custom computer-font digit dataset (CFDD), enabling robust identification of numerical symbols under varying font styles. The hardware platform consists of a 3D-printed robotic hand mounted on a linear rail and driven by an ESP32-based embedded controller with custom driver circuits. According to the recognized musical notes, the manipulator executes lateral positioning and vertical key-press motions to reproduce piano melodies. Experimental results demonstrate reliable notation recognition and accurate performance execution, confirming the feasibility of combining computer vision and robotic manipulation for low-cost, automated musical performance. Full article
(This article belongs to the Special Issue Advances and Challenges in Robotic Manipulation)
Show Figures

Figure 1

22 pages, 1145 KB  
Article
TSMTFN: Two-Stream Temporal Shift Module Network for Efficient Egocentric Gesture Recognition in Virtual Reality
by Muhammad Abrar Hussain, Chanjun Chun and SeongKi Kim
Virtual Worlds 2025, 4(4), 58; https://doi.org/10.3390/virtualworlds4040058 - 4 Dec 2025
Cited by 1 | Viewed by 776
Abstract
Egocentric hand gesture recognition is vital for natural human–computer interaction in augmented and virtual reality (AR/VR) systems. However, most deep learning models struggle to balance accuracy and efficiency, limiting real-time use on wearable devices. This paper introduces a Two-Stream Temporal Shift Module Transformer [...] Read more.
Egocentric hand gesture recognition is vital for natural human–computer interaction in augmented and virtual reality (AR/VR) systems. However, most deep learning models struggle to balance accuracy and efficiency, limiting real-time use on wearable devices. This paper introduces a Two-Stream Temporal Shift Module Transformer Fusion Network (TSMTFN) that achieves high recognition accuracy with low computational cost. The model integrates Temporal Shift Modules (TSMs) for efficient motion modeling and a Transformer-based fusion mechanism for long-range temporal understanding, operating on dual RGB-D streams to capture complementary visual and depth cues. Training stability and generalization are enhanced through full-layer training from epoch 1 and MixUp/CutMix augmentations. Evaluated on the EgoGesture dataset, TSMTFN attained 96.18% top-1 accuracy and 99.61% top-5 accuracy on the independent test set with only 16 GFLOPs and 21.3M parameters, offering a 2.4–4.7× reduction in computation compared to recent state-of-the-art methods. The model runs at 15.10 samples/s, achieving real-time performance. The results demonstrate robust recognition across over 95% of gesture classes and minimal inter-class confusion, establishing TSMTFN as an efficient, accurate, and deployable solution for next-generation wearable AR/VR gesture interfaces. Full article
Show Figures

Figure 1

24 pages, 13469 KB  
Article
Accessible American Sign Language Learning in Virtual Reality via Inverse Kinematics
by Jeremy Immanuel and Santiago Berrezueta-Guzman
Virtual Worlds 2025, 4(4), 57; https://doi.org/10.3390/virtualworlds4040057 - 4 Dec 2025
Cited by 3 | Viewed by 1247
Abstract
Along with the rapid advancement of Virtual Reality (VR) and the metaverse, interest in this technology has surged among game developers and in fields such as education and healthcare. VR has enabled the rise in immersive, gamified activities, whether for rehabilitation, therapy, or [...] Read more.
Along with the rapid advancement of Virtual Reality (VR) and the metaverse, interest in this technology has surged among game developers and in fields such as education and healthcare. VR has enabled the rise in immersive, gamified activities, whether for rehabilitation, therapy, or learning. Additionally, VR and Motion Capture (MoCap) have allowed developers to create further accessibility features for end-users with special needs. However, the excitement of using new technology often does not align with the end user’s use cases. The over-reliance on cutting-edge hardware can negatively impact most end users who lack access to such expensive tools. To this end, we conducted an inclusivity-focused study that enables learners to practice ASL in an immersive and engaging way using only head- and controller-based tracking. Our approach replaces full-body MoCap with Inverse Kinematics (IK) and simple controller mappings for upper-body pose and hand-gesture recognition, providing a low-cost, reproducible alternative to costly setups. Full article
Show Figures

Graphical abstract

12 pages, 1869 KB  
Article
Comparison of Marker-Based and Markerless Motion Capture Systems for Measuring Throwing Kinematics
by Carina Thomas, Kevin Nolte, Marcus Schmidt and Thomas Jaitner
Biomechanics 2025, 5(4), 100; https://doi.org/10.3390/biomechanics5040100 - 2 Dec 2025
Cited by 1 | Viewed by 2595
Abstract
Background: Marker-based motion capture systems are commonly used for three-dimensional movement analysis in sports. Novel, markerless motion capture systems enable the collection of comparable data under more time-efficient conditions with higher flexibility and fewer restrictions for the athletes during movement execution. Studies show [...] Read more.
Background: Marker-based motion capture systems are commonly used for three-dimensional movement analysis in sports. Novel, markerless motion capture systems enable the collection of comparable data under more time-efficient conditions with higher flexibility and fewer restrictions for the athletes during movement execution. Studies show comparable results between markerless and marker-based systems for kinematics of the lower extremities, especially for walking gait. For more complex movements, such as throwing, limited data on the agreement of markerless and marker-based systems is available. The aim of this study is to compare the outcome of a video-based markerless motion capture system with a marker-based approach during an artificial basketball-throwing task. Methods: Thirteen subjects performed five simulated basketball throws under laboratory conditions, and were recorded simultaneously with the marker-based measurement system, as well as two versions of a markerless measurement system (differing in their release date). Knee, hip, shoulder, elbow and wrist joint angles were acquired and root mean square distance (RMSD) was calculated for all subjects, parameters and attempts. Results: The RMSD of all joint angles of the marker-based and markerless systems ranged from 7.17° ± 3.88° to 26.66° ± 14.77° depended on the joint. The newest version of the markerless system showed lower RMSD values compared to the older version, with an RMSD of 16.68 ± 5.03° for elbow flexion, capturing 93.84% of the data’s RMSD of 22.22 ± 5.52, accounting for 87.69% of the data. While both versions showed similar results for right knee flexion, lower differences were observed in the new version for right hip flexion, with an RMSD of 8.17 ± 3.75 compared to the older version’s 13.24 ± 5.78. Additionally, the new version demonstrated lower RMSD values for right hand flexion. Conclusions: Overall, the new version of the markerless system showed lower RMSD values across various joint angles during throwing movement analysis compared to the older version. However, the differences between markerless and marker-based systems are especially large for the upper extremities. In conclusion, it is not clearly explainable if the detected inter-system differences are due to inaccuracies of one system or the other, or a combination of both, as both methodologies possess special limitations (soft tissue vibration or joint center position accuracy). Further investigations are needed to clarify the accordance between markerless and marker-based motion capture systems during complex movements. Full article
(This article belongs to the Section Sports Biomechanics)
Show Figures

Figure 1

Back to TopTop