Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (197)

Search Parameters:
Keywords = hand gesture classification

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 5784 KB  
Article
Learning Italian Hand Gesture Culture Through an Automatic Gesture Recognition Approach
by Chiara Innocente, Giorgio Di Pisa, Irene Lionetti, Andrea Mamoli, Manuela Vitulano, Giorgia Marullo, Simone Maffei, Enrico Vezzetti and Luca Ulrich
Future Internet 2026, 18(4), 177; https://doi.org/10.3390/fi18040177 - 24 Mar 2026
Viewed by 218
Abstract
Italian hand gestures constitute a distinctive and widely recognized form of nonverbal communication, deeply embedded in everyday interaction and cultural identity. Despite their prominence, these gestures are rarely formalized or systematically taught, posing challenges for foreign speakers and visitors seeking to interpret their [...] Read more.
Italian hand gestures constitute a distinctive and widely recognized form of nonverbal communication, deeply embedded in everyday interaction and cultural identity. Despite their prominence, these gestures are rarely formalized or systematically taught, posing challenges for foreign speakers and visitors seeking to interpret their meaning and pragmatic use. Moreover, their ephemeral and embodied nature complicates traditional preservation and transmission approaches, positioning them within the broader domain of intangible cultural heritage. This paper introduces a machine learning–based framework for recognizing iconic Italian hand gestures, designed to support cultural learning and engagement among foreign speakers and visitors. The approach combines RGB–D sensing with depth-enhanced geometric feature extraction, employing interpretable classification models trained on a purpose-built dataset. The recognition system is integrated into a non-immersive virtual reality application simulating an interactive digital totem conceived for public arrival spaces, providing tutorial content, real-time gesture recognition, and immediate feedback within a playful and accessible learning environment. Three supervised machine learning pipelines were evaluated, and Random Forest achieved the best overall performance. Its integration with an Isolation Forest module was further considered for deployment, achieving a macro-averaged accuracy and F1-score of 0.82 under a 5-fold cross-validation protocol. An experimental user study was conducted with 25 subjects to evaluate the proposed interactive system in terms of usability, user engagement, and learning effectiveness, obtaining favorable results and demonstrating its potential as a practical tool for cultural education and intercultural communication. Full article
Show Figures

Figure 1

17 pages, 1701 KB  
Article
CLIP-ArASL: A Lightweight Multimodal Model for Arabic Sign Language Recognition
by Naif Alasmari
Appl. Sci. 2026, 16(5), 2573; https://doi.org/10.3390/app16052573 - 7 Mar 2026
Viewed by 251
Abstract
Arabic sign language (ArASL) is the primary communication medium for Deaf and hard-of-hearing people across Arabic-speaking communities. Most current ArASL recognition systems are based solely on visual features and do not incorporate linguistic or semantic information that could improve generalization and semantic grounding. [...] Read more.
Arabic sign language (ArASL) is the primary communication medium for Deaf and hard-of-hearing people across Arabic-speaking communities. Most current ArASL recognition systems are based solely on visual features and do not incorporate linguistic or semantic information that could improve generalization and semantic grounding. This paper introduces CLIP-ArASL, a lightweight CLIP-style multimodal approach for static ArASL letter recognition that aligns visual hand gestures with bilingual textual descriptions. The approach integrates an EfficientNet-B0 image encoder with a MiniLM text encoder to learn a shared embedding space using a hybrid objective that combines contrastive and cross-entropy losses. This design supports supervised classification on seen classes and zero-shot prediction on unseen classes using textual class representations. The proposed approach is evaluated on two public datasets, ArASL2018 and ArASL21L. Under supervised evaluation, recognition accuracies of 99.25±0.14% and 91.51±1.29% are achieved, respectively. Zero-shot performance is assessed by withholding 20% of gesture classes during training and predicting them using only their textual descriptions. In this setting, accuracies of 55.2±12.15% on ArASL2018 and 37.6±9.07% on ArASL21L are obtained. These results show that multimodal vision–language alignment supports semantic transfer and enables recognition of unseen classes. Full article
(This article belongs to the Special Issue Machine Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

17 pages, 1732 KB  
Article
Lightweight Visual Dynamic Gesture Recognition System Based on CNN-LSTM-DSA
by Zhenxing Wang, Ziyan Wu, Ruidi Qi and Xuan Dou
Sensors 2026, 26(5), 1558; https://doi.org/10.3390/s26051558 - 2 Mar 2026
Viewed by 375
Abstract
Addressing the challenges of large-scale gesture recognition models, high computational complexity, and inefficient deployment on embedded devices, this study designs and implements a visual dynamic gesture recognition system based on a lightweight CNN-LSTM-DSA model. The system captures user hand images via a camera, [...] Read more.
Addressing the challenges of large-scale gesture recognition models, high computational complexity, and inefficient deployment on embedded devices, this study designs and implements a visual dynamic gesture recognition system based on a lightweight CNN-LSTM-DSA model. The system captures user hand images via a camera, extracts 21 keypoint 3D coordinates using MediaPipe, and employs a lightweight hybrid model to perform spatial and temporal feature modeling on keypoint sequences, achieving high-precision recognition of complex dynamic gestures. In static gesture recognition, the system determines the gesture state through joint angle calculation and a sliding window smoothing algorithm, ensuring smooth mapping of the servo motor angles and stability of the robotic hand’s movements. In dynamic gesture recognition, the system models the key point time series based on the CNN-LSTM-DSA hybrid model, enabling accurate classification and reproduction of gesture actions. Experimental results show that the proposed system demonstrates good robustness under various lighting and background conditions, with a static gesture recognition accuracy of up to 96%, dynamic gesture recognition accuracy of 90.19%, and an overall response delay of less than 300 ms. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

18 pages, 1956 KB  
Article
Dynamic Occlusion-Aware Facial Expression Recognition Guided by AA-ViT
by Xiangwei Mou, Xiuping Xie, Yongfu Song and Rijun Wang
Electronics 2026, 15(4), 764; https://doi.org/10.3390/electronics15040764 - 11 Feb 2026
Viewed by 347
Abstract
In complex natural scenarios, facial expression recognition often encounters partial occlusions caused by glasses, hand gestures, and hairstyles, making it difficult for models to extract effective features and thereby reducing recognition accuracy. Existing methods often employ attention mechanisms to enhance expression-related features, but [...] Read more.
In complex natural scenarios, facial expression recognition often encounters partial occlusions caused by glasses, hand gestures, and hairstyles, making it difficult for models to extract effective features and thereby reducing recognition accuracy. Existing methods often employ attention mechanisms to enhance expression-related features, but they fail to adequately address the issue where high-frequency responses in occluded regions can disperse attention weights (e.g., incorrectly focus on occluded areas), making it challenging to effectively utilize local cues around the occlusions and limiting performance improvement. To address this, this paper proposes a network based on an adaptive attention mechanism (Adaptive Attention Vision Transformer, AA-ViT). First, an Adaptive Attention module (ADA) is designed to dynamically adjust attention scores in occluded regions, enhancing the effective information in features. Next, a Dual-Branch Multi-Layer Perceptron (DB-MLP) replaces the single linear layer to improve feature representation and model classification capability. Additionally, a Random Erasure (RE) strategy is introduced to enhance model robustness. Finally, to address the issue of model training instability caused by class imbalance in the training dataset, a hybrid loss function combining Focal Loss and Cross-Entropy Loss is adopted to ensure training stability. Experimental results show that AA-ViT achieves expression recognition accuracies of 90.66% and 90.01% on the RAF-DB and FERPlus datasets, respectively, representing improvements of 4.58 and 18.9 percentage points over the baseline ViT model, with only a 24.3% increase in parameter count. Compared to existing methods, the proposed approach demonstrates superior performance in occluded facial expression recognition tasks. Full article
Show Figures

Figure 1

17 pages, 7804 KB  
Article
A 3D Camera-Based Approach for Real-Time Hand Configuration Recognition in Italian Sign Language
by Luca Ulrich, Asia De Luca, Riccardo Miraglia, Emma Mulassano, Simone Quattrocchio, Giorgia Marullo, Chiara Innocente, Federico Salerno and Enrico Vezzetti
Sensors 2026, 26(3), 1059; https://doi.org/10.3390/s26031059 - 6 Feb 2026
Cited by 1 | Viewed by 426
Abstract
Deafness poses significant challenges to effective communication, particularly in contexts where access to sign language interpreters is limited. Hand configuration recognition represents a fundamental component of sign language understanding, as configurations constitute a core cheremic element in many sign languages, including Italian Sign [...] Read more.
Deafness poses significant challenges to effective communication, particularly in contexts where access to sign language interpreters is limited. Hand configuration recognition represents a fundamental component of sign language understanding, as configurations constitute a core cheremic element in many sign languages, including Italian Sign Language (LIS). In this work, we address configuration-level recognition as an independent classification task and propose a machine vision framework based on RGB-D sensing. The proposed approach combines MediaPipe-based hand landmark extraction with normalized three-dimensional geometric features and a Support Vector Machine classifier. The first contribution of this study is the formulation of LIS hand configuration recognition as a standalone, configuration-level problem, decoupled from temporal gesture modeling. The second contribution is the integration of sensor-acquired RGB-D depth measurements into the landmark-based feature representation, enabling a direct comparison with estimated depth obtained from monocular data. The third contribution consists of a systematic experimental evaluation on two LIS configuration sets (6 and 16 classes), demonstrating that the use of real depth significantly improves classification performance and class separability, particularly for geometrically similar configurations. The results highlight the critical role of depth quality in configuration-level recognition and provide insights into the design of robust vision-based systems for LIS analysis. Full article
(This article belongs to the Special Issue Sensing and Machine Learning Control: Progress and Applications)
Show Figures

Figure 1

21 pages, 2592 KB  
Article
Parsing Emotion in Classical Music: A Behavioral Study on the Cognitive Mapping of Key, Tempo, Complexity and Energy in Piano Performance
by Alice Mado Proverbio, Chang Qin and Miloš Milovanović
Appl. Sci. 2026, 16(3), 1371; https://doi.org/10.3390/app16031371 - 29 Jan 2026
Viewed by 741
Abstract
Music conveys emotion through a complex interplay of structural and acoustic cues, yet how these features map onto specific affective interpretations remains a key question in music cognition. This study explored how listeners, unaware of contextual information, categorized 110 emotionally diverse excerpts—varying in [...] Read more.
Music conveys emotion through a complex interplay of structural and acoustic cues, yet how these features map onto specific affective interpretations remains a key question in music cognition. This study explored how listeners, unaware of contextual information, categorized 110 emotionally diverse excerpts—varying in key, tempo, note density, acoustic energy, and expressive gestures—from works by Bach, Beethoven, and Chopin. Twenty classically trained participants labeled each excerpt using six predefined emotional categories. Emotion judgments were analyzed within a supervised multi-class classification framework, allowing systematic quantification of recognition accuracy, misclassification patterns, and category reliability. Behavioral responses were consistently above chance, indicating shared decoding strategies. Quantitative analyses of live performance recordings revealed systematic links between expressive features and emotional tone: high-arousal emotions showed increased acoustic intensity, faster gestures, and dominant right-hand activity, while low-arousal states involved softer dynamics and more left-hand involvement. Major-key excerpts were commonly associated with positive emotions—“Peacefulness” with slow tempos and low intensity, “Joy” with fast, energetic playing. Minor-key excerpts were linked to negative/ambivalent emotions, aligning with prior research on the emotional complexity of minor modality. Within the minor mode, a gradient of arousal emerged, from “Melancholy” to “Power,” the latter marked by heightened motor activity and sonic force. Results support an embodied view of musical emotion, where expressive meaning emerges through dynamic motor-acoustic patterns that transcend stylistic and cultural boundaries. Full article
(This article belongs to the Special Issue Multimodal Emotion Recognition and Affective Computing)
Show Figures

Figure 1

24 pages, 6118 KB  
Article
Effective Approach for Classifying EMG Signals Through Reconstruction Using Autoencoders
by Natalia Rendón Caballero, Michelle Rojo González, Marcos Aviles, José Manuel Alvarez Alvarado, José Billerman Robles-Ocampo, Perla Yazmin Sevilla-Camacho and Juvenal Rodríguez-Reséndiz
AI 2026, 7(1), 36; https://doi.org/10.3390/ai7010036 - 22 Jan 2026
Viewed by 610
Abstract
The study of muscle signal classification has been widely explored for the control of myoelectric prostheses. Traditional approaches rely on manually designed features extracted from time- or frequency-domain representations, which may limit the generalization and adaptability of EMG-based systems. In this work, an [...] Read more.
The study of muscle signal classification has been widely explored for the control of myoelectric prostheses. Traditional approaches rely on manually designed features extracted from time- or frequency-domain representations, which may limit the generalization and adaptability of EMG-based systems. In this work, an autoencoder-based framework is proposed for automatic feature extraction, enabling the learning of compact latent representations directly from raw EMG signals and reducing dependence on handcrafted features. A custom instrumentation system with three surface EMG sensors was developed and placed on selected forearm muscles to acquire signals associated with five hand movements from 20 healthy participants aged 18 to 40 years. The signals were segmented into 200 ms windows with 75% overlap. The proposed method employs a recurrent autoencoder with a symmetric encoder–decoder architecture, trained independently for each sensor to achieve accurate signal reconstruction, with a minimum reconstruction loss of 3.3×104V2. The encoder’s latent representations were then used to train a dense neural network for gesture classification. An overall efficiency of 93.84% was achieved, demonstrating that the proposed reconstruction-based approach provides high classification performance and represents a promising solution for future EMG-based assistive and control applications. Full article
(This article belongs to the Special Issue Transforming Biomedical Innovation with Artificial Intelligence)
Show Figures

Figure 1

27 pages, 4802 KB  
Article
Fine-Grained Radar Hand Gesture Recognition Method Based on Variable-Channel DRSN
by Penghui Chen, Siben Li, Chenchen Yuan, Yujing Bai and Jun Wang
Electronics 2026, 15(2), 437; https://doi.org/10.3390/electronics15020437 - 19 Jan 2026
Viewed by 454
Abstract
With the ongoing miniaturization of smart devices, fine-grained hand gesture recognition using millimeter-wave radar has attracted increasing attention, yet practical deployment remains challenging in continuous-gesture segmentation, robust feature extraction, and reliable classification. This paper presents an end-to-end fine-grained gesture recognition framework based on [...] Read more.
With the ongoing miniaturization of smart devices, fine-grained hand gesture recognition using millimeter-wave radar has attracted increasing attention, yet practical deployment remains challenging in continuous-gesture segmentation, robust feature extraction, and reliable classification. This paper presents an end-to-end fine-grained gesture recognition framework based on frequency modulated continuous wave(FMCW) millimeter-wave radar, including gesture design, data acquisition, feature construction, and neural network-based classification. Ten gesture types are recorded (eight valid gestures and two return-to-neutral gestures); for classification, the two return-to-neutral gesture types are merged into a single invalid class, yielding a nine-class task. A sliding-window segmentation method is developed using short-time Fourier transformation(STFT)-based Doppler-time representations, and a dataset of 4050 labeled samples is collected. Multiple signal classification(MUSIC)-based super-resolution estimation is adopted to construct range–time and angle–time representations, and instance-wise normalization is applied to Doppler and range features to mitigate inter-individual variability without test leakage. For recognition, a variable-channel deep residual shrinkage network (DRSN) is employed to improve robustness to noise, supporting single-, dual-, and triple-channel feature inputs. Results under both subject-dependent evaluation with repeated random splits and subject-independent leave one subject out(LOSO) cross-validation show that DRSN architecture consistently outperforms the RefineNet-based baseline, and the triple-channel configuration achieves the best performance (98.88% accuracy). Overall, the variable-channel design enables flexible feature selection to meet diverse application requirements. Full article
Show Figures

Figure 1

27 pages, 11232 KB  
Article
Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2
by Sergei Kondratev, Yulia Dyrchenkova, Georgiy Nikitin, Leonid Voskov, Vladimir Pikalov and Victor Meshcheryakov
Technologies 2026, 14(1), 69; https://doi.org/10.3390/technologies14010069 - 16 Jan 2026
Viewed by 757
Abstract
This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in [...] Read more.
This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in scenarios where traditional remote controllers are impractical or unavailable. The architecture comprises two hierarchical control levels: (1) high-level discrete command control utilizing a fully connected neural network classifier for static gesture recognition, and (2) low-level continuous flight control based on three-dimensional hand keypoint analysis from a depth camera. The gesture classification module achieves an accuracy exceeding 99% using a multi-layer perceptron trained on MediaPipe-extracted hand landmarks. For continuous control, we propose a novel approach that computes Euler angles (roll, pitch, yaw) and throttle from 3D hand pose estimation, enabling intuitive four-degree-of-freedom quadcopter manipulation. A hybrid signal filtering pipeline ensures robust control signal generation while maintaining real-time responsiveness. Comparative user studies demonstrate that gesture-based control reduces task completion time by 52.6% for beginners compared to conventional remote controllers. The results confirm the viability of vision-based gesture interfaces for IoT-enabled UAV applications. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Figure 1

24 pages, 15172 KB  
Article
Real-Time Hand Gesture Recognition for IoT Devices Using FMCW mmWave Radar and Continuous Wavelet Transform
by Anna Ślesicka and Adam Kawalec
Electronics 2026, 15(2), 250; https://doi.org/10.3390/electronics15020250 - 6 Jan 2026
Viewed by 771
Abstract
This paper presents an intelligent framework for real-time hand gesture recognition using Frequency-Modulated Continuous-Wave (FMCW) mmWave radar and deep learning. Unlike traditional radar-based recognition methods that rely on Discrete Fourier Transform (DFT) signal representations and focus primarily on classifier optimization, the proposed system [...] Read more.
This paper presents an intelligent framework for real-time hand gesture recognition using Frequency-Modulated Continuous-Wave (FMCW) mmWave radar and deep learning. Unlike traditional radar-based recognition methods that rely on Discrete Fourier Transform (DFT) signal representations and focus primarily on classifier optimization, the proposed system introduces a novel pre-processing stage based on the Continuous Wavelet Transform (CWT). The CWT enables the extraction of discriminative time–frequency features directly from raw radar signals, improving the interpretability and robustness of the learned representations. A lightweight convolutional neural network architecture is then designed to process the CWT maps for efficient classification on edge IoT devices. Experimental validation with data collected from 20 participants performing five standardized gestures demonstrates that the proposed framework achieves an accuracy of up to 99.87% using the Morlet wavelet, with strong generalization to unseen users (82–84% accuracy). The results confirm that the integration of CWT-based radar signal processing with deep learning forms a computationally efficient and accurate intelligent system for human–computer interaction in real-time IoT environments. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, 4th Edition)
Show Figures

Figure 1

17 pages, 1312 KB  
Article
RGB Fusion of Multiple Radar Sensors for Deep Learning-Based Traffic Hand Gesture Recognition
by Hüseyin Üzen
Electronics 2026, 15(1), 140; https://doi.org/10.3390/electronics15010140 - 28 Dec 2025
Viewed by 485
Abstract
Hand gesture recognition (HGR) systems play a critical role in modern intelligent transportation frameworks by enabling reliable communication between pedestrians, traffic operators, and autonomous vehicles. This work presents a novel traffic hand gesture recognition method that combines nine grayscale radar images captured from [...] Read more.
Hand gesture recognition (HGR) systems play a critical role in modern intelligent transportation frameworks by enabling reliable communication between pedestrians, traffic operators, and autonomous vehicles. This work presents a novel traffic hand gesture recognition method that combines nine grayscale radar images captured from multiple millimeter-wave radar nodes into a single RGB representation through an optimized rotation–shift fusion strategy. This transformation preserves complementary spatial information while minimizing inter-image interference, enabling deep learning models to more effectively utilize the distinctive micro-Doppler and spatial patterns embedded in radar measurements. Extensive experimental studies were conducted to verify the model’s performance, demonstrating that the proposed RGB fusion approach provides higher classification accuracy than single-sensor or unfused representations. In addition, the proposed model outperformed state-of-the-art methods in the literature with an accuracy of 92.55%. These results highlight its potential as a lightweight yet powerful solution for reliable gesture interpretation in future intelligent transportation and human–vehicle interaction systems. Full article
(This article belongs to the Special Issue Advanced Techniques for Multi-Agent Systems)
Show Figures

Figure 1

19 pages, 4225 KB  
Article
Integration of EMG and Machine Learning for Real-Time Control of a 3D-Printed Prosthetic Arm
by Adedotun Adetunla, Chukwuebuka Anulunko, Tien-Chien Jen and Choon Kit Chan
Prosthesis 2025, 7(6), 166; https://doi.org/10.3390/prosthesis7060166 - 16 Dec 2025
Cited by 1 | Viewed by 2234
Abstract
Background: Advancements in low-cost additive manufacturing and artificial intelligence have enabled new avenues for developing accessible myoelectric prostheses. However, achieving reliable real-time control and ensuring mechanical durability remain significant challenges, particularly for affordable systems designed for resource-constrained settings. Objective: This study aimed to [...] Read more.
Background: Advancements in low-cost additive manufacturing and artificial intelligence have enabled new avenues for developing accessible myoelectric prostheses. However, achieving reliable real-time control and ensuring mechanical durability remain significant challenges, particularly for affordable systems designed for resource-constrained settings. Objective: This study aimed to design and validate a low-cost, 3D-printed prosthetic arm that integrates single-channel electromyography (EMG) sensing with machine learning for real-time gesture classification. The device incorporates an anatomically inspired structure with 14 passive mechanical degrees of freedom (DOF) and 5 actively actuated tendon-driven DOF. The objective was to evaluate the system’s ability to recognize open, close, and power-grip gestures and to assess its functional grasping performance. Method: A Fast Fourier Transform (FFT)-based feature extraction pipeline was implemented on single-channel EMG data collected from able-bodied participants. A Support Vector Machine (SVM) classifier was trained on 5000 EMG samples to distinguish three gesture classes and benchmarked against alternative models. Mechanical performance was assessed through power-grip evaluation, while material feasibility was examined using PLA-based 3D-printed components. No amputee trials or long-term durability tests were conducted in this phase. Results: The SVM classifier achieved 92.7% accuracy, outperforming K-Nearest Neighbors and Artificial Neural Networks. The prosthetic hand demonstrated a 96.4% power-grip success rate, confirming stable grasping performance despite its simplified tendon-driven actuation. Limitations include the reliance on single-channel EMG, testing restricted to able-bodied subjects, and the absence of dynamic loading or long-term mechanical reliability assessments, which collectively limit clinical generalizability. Overall, the findings confirm the technical feasibility of integrating low-cost EMG sensing, machine learning, and 3D printing for real-time prosthetic control while emphasizing the need for expanded biomechanical testing and amputee-specific validation prior to clinical application. Full article
Show Figures

Figure 1

19 pages, 3770 KB  
Article
Evaluating Stroke-Related Motor Impairment and Recovery Using Macroscopic and Microscopic Features of HD-sEMG
by Wenting Qin, Xin Tan, Yi Yu, Yujie Zhang, Zhanhui Lin, Chenyun Dai, Yuxiang Yang, Lingyu Liu and Lingjing Jin
Bioengineering 2025, 12(12), 1357; https://doi.org/10.3390/bioengineering12121357 - 12 Dec 2025
Viewed by 829
Abstract
Stroke-induced motor impairment necessitates objective and quantitative assessment tools for rehabilitation planning. In this study, a gesture-specific framework based on high-density surface electromyography (HD-sEMG) was developed to characterize neuromuscular dysfunction using eight macroscopic features and two microscopic motor unit decomposition features. HD-sEMG recordings [...] Read more.
Stroke-induced motor impairment necessitates objective and quantitative assessment tools for rehabilitation planning. In this study, a gesture-specific framework based on high-density surface electromyography (HD-sEMG) was developed to characterize neuromuscular dysfunction using eight macroscopic features and two microscopic motor unit decomposition features. HD-sEMG recordings were collected from stroke patients (n = 11; affected and unaffected sides) and healthy controls (n = 8; dominant side) during seven standardized hand gestures. Feature-level comparisons revealed hierarchical abnormalities, with the affected side showing significantly reduced activation/coordination relative to healthy controls, while the unaffected side exhibited intermediate deviations. For each gesture, dedicated K-nearest neighbors (KNN) models were constructed for clinical validation. For Brunnstrom stage classification, wrist extension yielded the best performance, achieving 92.08% accuracy and effectively discriminating severe (Stage 4), moderate (Stage 5), and mild (Stage 6) impairment as well as healthy controls. For fine motor recovery prediction, the thumb–index–middle finger pinch provided the optimal regression performance, predicting Upper Extremity Fugl–Meyer Assessment (UE-FMA) scores with R = 0.86 and RMSE = 3.24. These results indicate that gesture selection should be aligned with the clinical endpoint: wrist extension is most informative for gross recovery staging, whereas pinch gestures better capture fine motor control. Overall, the proposed HD-sEMG framework provides an objective approach for monitoring post-stroke recovery and supporting personalized rehabilitation assessment. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

31 pages, 9303 KB  
Article
Automatic Quadrotor Dispatch Missions Based on Air-Writing Gesture Recognition
by Pu-Sheng Tsai, Ter-Feng Wu and Yen-Chun Wang
Processes 2025, 13(12), 3984; https://doi.org/10.3390/pr13123984 - 9 Dec 2025
Viewed by 702
Abstract
This study develops an automatic dispatch system for quadrotor UAVs that integrates air-writing gesture recognition with a graphical user interface (GUI). The DJI RoboMaster quadrotor UAV (DJI, Shenzhen, China) was employed as the experimental platform, combined with an ESP32 microcontroller (Espressif Systems, Shanghai, [...] Read more.
This study develops an automatic dispatch system for quadrotor UAVs that integrates air-writing gesture recognition with a graphical user interface (GUI). The DJI RoboMaster quadrotor UAV (DJI, Shenzhen, China) was employed as the experimental platform, combined with an ESP32 microcontroller (Espressif Systems, Shanghai, China) and the RoboMaster SDK (version 3.0). On the Python (version 3.12.7) platform, a GUI was implemented using Tkinter (version 8.6), allowing users to input addresses or landmarks, which were then automatically converted into geographic coordinates and imported into Google Maps for route planning. The generated flight commands were transmitted to the UAV via a UDP socket, enabling remote autonomous flight. For gesture recognition, a Raspberry Pi integrated with the MediaPipe Hands module was used to capture 16 types of air-written flight commands in real time through a camera. The training samples were categorized into one-dimensional coordinates and two-dimensional images. In the one-dimensional case, X/Y axis coordinates were concatenated after data augmentation, interpolation, and normalization. In the two-dimensional case, three types of images were generated, namely font trajectory plots (T-plots), coordinate-axis plots (XY-plots), and composite plots combining the two (XYT-plots). To evaluate classification performance, several machine learning and deep learning architectures were employed, including a multi-layer perceptron (MLP), support vector machine (SVM), one-dimensional convolutional neural network (1D-CNN), and two-dimensional convolutional neural network (2D-CNN). The results demonstrated effective recognition accuracy across different models and sample formats, verifying the feasibility of the proposed air-writing trajectory framework for non-contact gesture-based UAV control. Furthermore, by combining gesture recognition with a GUI-based map planning interface, the system enhances the intuitiveness and convenience of UAV operation. Future extensions, such as incorporating aerial image object recognition, could extend the framework’s applications to scenarios including forest disaster management, vehicle license plate recognition, and air pollution monitoring. Full article
Show Figures

Figure 1

38 pages, 3741 KB  
Article
Hybrid Convolutional Vision Transformer for Robust Low-Channel sEMG Hand Gesture Recognition: A Comparative Study with CNNs
by Ruthber Rodriguez Serrezuela, Roberto Sagaro Zamora, Daily Milanes Hermosilla, Andres Eduardo Rivera Gomez and Enrique Marañon Reyes
Biomimetics 2025, 10(12), 806; https://doi.org/10.3390/biomimetics10120806 - 3 Dec 2025
Cited by 1 | Viewed by 1137
Abstract
Hand gesture classification using surface electromyography (sEMG) is fundamental for prosthetic control and human–machine interaction. However, most existing studies focus on high-density recordings or large gesture sets, leaving limited evidence on performance in low-channel, reduced-gesture configurations. This study addresses this gap by comparing [...] Read more.
Hand gesture classification using surface electromyography (sEMG) is fundamental for prosthetic control and human–machine interaction. However, most existing studies focus on high-density recordings or large gesture sets, leaving limited evidence on performance in low-channel, reduced-gesture configurations. This study addresses this gap by comparing a classical convolutional neural network (CNN), inspired by Atzori’s design, with a Convolutional Vision Transformer (CViT) tailored for compact sEMG systems. Two datasets were evaluated: a proprietary Myo-based collection (10 subjects, 8 channels, six gestures) and a subset of NinaPro DB3 (11 transradial amputees, 12 channels, same gestures). Both models were trained using standardized preprocessing, segmentation, and balanced windowing procedures. Results show that the CNN performs robustly on homogeneous signals (Myo: 94.2% accuracy) but exhibits increased variability in amputee recordings (NinaPro: 92.0%). In contrast, the CViT consistently matches or surpasses the CNN, reaching 96.6% accuracy on Myo and 94.2% on NinaPro. Statistical analyses confirm significant differences in the Myo dataset. The objective of this work is to determine whether hybrid CNN–ViT architectures provide superior robustness and generalization under low-channel sEMG conditions. Rather than proposing a new architecture, this study delivers the first systematic benchmark of CNN and CViT models across amputee and non-amputee subjects using short windows, heterogeneous signals, and identical protocols, highlighting their suitability for compact prosthetic–control systems. Full article
Show Figures

Graphical abstract

Back to TopTop