MDPI - Publisher of Open Access Journals

15 pages, 2164 KB

Open AccessArticle

Real-Time Chinese Sign Language Gesture Prediction Based on Surface EMG Sensors and Artificial Neural Network

by Jinrun Cheng, Xing Hu and Kuo Yang

Electronics 2025, 14(22), 4374; https://doi.org/10.3390/electronics14224374 - 9 Nov 2025

Viewed by 172

Sign language recognition aims to capture and classify hand and arm motion signals to enable intuitive communication for individuals with hearing and speech impairments. This study proposes a real-time Chinese Sign Language (CSL) recognition framework that integrates a dual-stage segmentation strategy with a [...] Read more.

Sign language recognition aims to capture and classify hand and arm motion signals to enable intuitive communication for individuals with hearing and speech impairments. This study proposes a real-time Chinese Sign Language (CSL) recognition framework that integrates a dual-stage segmentation strategy with a lightweight three-layer artificial neural network to achieve early gesture prediction before completion of motion sequences. The system was evaluated on a 21-class CSL dataset containing several highly similar gestures and achieved an accuracy of 91.5%, with low average inference latency per cycle. Furthermore, training set truncation experiments demonstrate that using only the first 50% of each gesture instance preserves model accuracy while reducing training time by half, thereby enhancing real-time efficiency and practical deployability for embedded or assistive applications. Full article

► Show Figures

Figure 1

41 pages, 6004 KB

Open AccessArticle

Hybrid Deep Learning Models for Arabic Sign Language Recognition in Healthcare Applications

by Ibtihel Mansour, Mohamed Hamroun, Sonia Lajmi, Ryma Abassi and Damien Sauveron

Big Data Cogn. Comput. 2025, 9(11), 281; https://doi.org/10.3390/bdcc9110281 - 8 Nov 2025

Viewed by 155

Abstract

Deaf and hearing-impaired individuals rely on sign language, a visual communication system using hand shapes, facial expressions, and body gestures. Sign languages vary by region. For example, Arabic Sign Language (ArSL) is notably different from American Sign Language (ASL). This project focuses on [...] Read more.

Deaf and hearing-impaired individuals rely on sign language, a visual communication system using hand shapes, facial expressions, and body gestures. Sign languages vary by region. For example, Arabic Sign Language (ArSL) is notably different from American Sign Language (ASL). This project focuses on creating an Arabic Sign Language Recognition (ArSLR) System tailored for healthcare, aiming to bridge communication gaps resulting from a lack of sign-proficient professionals and limited region-specific technological solutions. Our research addresses limitations in sign language recognition systems by introducing a novel framework centered on ResNet50ViT, a hybrid architecture that synergistically combines ResNet50’s robust local feature extraction with the global contextual modeling of Vision Transformers (ViT). We also explored a tailored Vision Transformer variant (SignViT) for Arabic Sign Language as a comparative model. Our main contribution is the ResNet50ViT model, which significantly outperforms existing approaches, specifically targeting the challenges of capturing sequential hand movements, which traditional CNN-based methods struggle with. We utilized an extensive dataset incorporating both static (36 signs) and dynamic (92 signs) medical signs. Through targeted preprocessing techniques and optimization strategies, we achieved significant performance improvements over conventional approaches. In our experiments, the proposed ResNet50-ViT achieved a remarkable 99.86% accuracy on the ArSL dataset, setting a new state-of-the-art, demonstrating the effectiveness of integrating ResNet50’s hierarchical local feature extraction with Vision Transformer’s global contextual modeling. For comparison, a fine-tuned Vision Transformer (SignViT) attained 98.03% accuracy, confirming the strength of transformer-based approaches but underscoring the clear performance gain enabled by our hybrid architecture. We expect that RAFID will help deaf patients communicate better with healthcare providers without needing human interpreters. Full article

► Show Figures

Figure 1

29 pages, 3413 KB

Open AccessArticle

Multimodal Communication Outcomes for Hispanic Autistic Preschoolers Following Coached Student Clinician and Caregiver-Led NDBIs

by Cindy Gevarter, Jaime Branaman, Jessica Nico, Erin Gallegos and Richelle McGuire

Behav. Sci. 2025, 15(10), 1425; https://doi.org/10.3390/bs15101425 - 20 Oct 2025

Viewed by 356

Abstract

This study examined child outcomes for five minimally verbal (or non-speaking) autistic preschoolers who participated in cascading coaching programs in which naturalistic developmental behavioral intervention (NDBI) techniques were taught to graduate student clinicians and Hispanic caregivers (three who primarily spoke English, and two [...] Read more.

This study examined child outcomes for five minimally verbal (or non-speaking) autistic preschoolers who participated in cascading coaching programs in which naturalistic developmental behavioral intervention (NDBI) techniques were taught to graduate student clinicians and Hispanic caregivers (three who primarily spoke English, and two who spoke Spanish). While prior studies reported on adult participant outcomes, this study analyzed child multimodal communication outcomes, using multiple baselines/probes single case experimental designs across contexts. Neurodiversity-affirming and culturally responsive principles were embedded within the intervention procedures. Following the introduction of a coached NDBI, all five children (three who received the intervention in English and two who received the intervention in Spanish) demonstrated increased use of (a) the total targeted communicative responses and (b) the targeted unprompted communicative responses, across both student clinician-led and caregiver-led play sessions. The Tau-U effect size measures revealed large-to-very large effects across all of the variables. Overall, higher rates of communication responses were observed during student clinician-led sessions than in caregiver-led sessions. Additionally, behavioral coding of the multimodal response forms (e.g., gestures, aided augmentative and alternative communication, signs, vocal words) using the Communication Matrix revealed that the children used a variety of response topographies during the intervention sessions beyond their preferred communication mode (e.g., signs for three participants). Four of the five children used symbolic communication forms consistently across both caregiver and student clinician-led sessions. Importantly, adults’ reinforcement of pre-symbolic or less advanced communication forms during the intervention did not inhibit the use of more advanced forms. Full article

(This article belongs to the Special Issue Early Identification and Intervention of Autism)

► Show Figures

Figure 1

14 pages, 1917 KB

Open AccessArticle

Moroccan Sign Language Recognition with a Sensory Glove Using Artificial Neural Networks

by Hasnae El Khoukhi, Assia Belatik, Imane El Manaa, My Abdelouahed Sabri, Yassine Abouch and Abdellah Aarab

Digital 2025, 5(4), 53; https://doi.org/10.3390/digital5040053 - 8 Oct 2025

Viewed by 652

Abstract

Every day, countless individuals with hearing or speech disabilities struggle to communicate effectively, as their conditions limit conventional verbal interaction. For them, sign language becomes an essential and often sole tool for expressing thoughts and engaging with others. However, the general public’s limited [...] Read more.

Every day, countless individuals with hearing or speech disabilities struggle to communicate effectively, as their conditions limit conventional verbal interaction. For them, sign language becomes an essential and often sole tool for expressing thoughts and engaging with others. However, the general public’s limited understanding of sign language poses a major barrier, often resulting in social, educational, and professional exclusion. To bridge this communication gap, the present study proposes a smart wearable glove system designed to translate Arabic sign language (ArSL), especially Moroccan sign language (MSL), into a written alphabet in real time. The glove integrates five MPU6050 motion sensors, one on each finger, capable of capturing detailed motion data, including angular velocity and linear acceleration. These motion signals are processed using an Artificial Neural Network (ANN), implemented directly on a Raspberry Pi Pico through embedded machine learning techniques. A custom dataset comprising labeled gestures corresponding to the MSL alphabet was developed for training the model. Following the training phase, the neural network attained a gesture recognition accuracy of 98%, reflecting strong performance in terms of reliability and classification precision. We developed an affordable and portable glove system aimed at improving daily communication for individuals with hearing impairments in Morocco, contributing to greater inclusivity and improved accessibility. Full article

► Show Figures

Figure 1

43 pages, 3034 KB

Open AccessArticle

Real-Time Recognition of NZ Sign Language Alphabets by Optimal Use of Machine Learning

by Mubashir Ali, Seyed Ebrahim Hosseini, Shahbaz Pervez and Muneer Ahmad

Bioengineering 2025, 12(10), 1068; https://doi.org/10.3390/bioengineering12101068 - 30 Sep 2025

Viewed by 558

Abstract

The acquisition of a person’s first language is one of their greatest accomplishments. Nevertheless, being fluent in sign language presents challenges for many deaf students who rely on it for communication. Effective communication is essential for both personal and professional interactions and is [...] Read more.

The acquisition of a person’s first language is one of their greatest accomplishments. Nevertheless, being fluent in sign language presents challenges for many deaf students who rely on it for communication. Effective communication is essential for both personal and professional interactions and is critical for community engagement. However, the lack of a mutually understood language can be a significant barrier. Estimates indicate that a large portion of New Zealand’s disability population is deaf, with an educational approach predominantly focused on oralism, emphasizing spoken language. This makes it essential to bridge the communication gap between the general public and individuals with speech difficulties. The aim of this project is to develop an application that systematically cycles through each letter and number in New Zealand Sign Language (NZSL), assessing the user’s proficiency. This research investigates various machine learning methods for hand gesture recognition, with a focus on landmark detection. In computer vision, identifying specific points on an object—such as distinct hand landmarks—is a standard approach for feature extraction. Evaluation of this system has been performed using machine learning techniques, including Random Forest (RF) Classifier, k-Nearest Neighbours (KNN), AdaBoost (AB), Naïve Bayes (NB), Support Vector Machine (SVM), Decision Trees (DT), and Logistic Regression (LR). The dataset used for model training and testing consists of approximately 100,000 hand gesture expressions, formatted into a CSV dataset for model training. Full article

(This article belongs to the Special Issue AI and Data Science in Bioengineering: Innovations and Applications)

► Show Figures

Figure 1

17 pages, 2255 KB

Open AccessArticle

Electromyography-Based Sign Language Recognition: A Low-Channel Approach for Classifying Fruit Name Gestures

by Kudratjon Zohirov, Mirjakhon Temirov, Sardor Boykobilov, Golib Berdiev, Feruz Ruziboev, Khojiakbar Egamberdiev, Mamadiyor Sattorov, Gulmira Pardayeva and Kuvonch Madatov

Signals 2025, 6(4), 50; https://doi.org/10.3390/signals6040050 - 25 Sep 2025

Viewed by 1125

Abstract

This paper presents a method for recognizing sign language gestures corresponding to fruit names using electromyography (EMG) signals. The proposed system focuses on classification using a limited number of EMG channels, aiming to reduce classification process complexity while maintaining high recognition accuracy. The [...] Read more.

This paper presents a method for recognizing sign language gestures corresponding to fruit names using electromyography (EMG) signals. The proposed system focuses on classification using a limited number of EMG channels, aiming to reduce classification process complexity while maintaining high recognition accuracy. The dataset (DS) contains EMG signal data of 46 hearing-impaired people and descriptions of fruit names, including apple, pear, apricot, nut, cherry, and raspberry, in sign language (SL). Based on the presented DS, gesture movements were classified using five different classification algorithms—Random Forest, k-Nearest Neighbors, Logistic Regression, Support Vector Machine, and neural networks—and the algorithm that gives the best result for gesture movements was determined. The best classification result was obtained during recognition of the word cherry based on the RF algorithm, and 97% accuracy was achieved. Full article

(This article belongs to the Special Issue Advances in Signal Detecting and Processing)

► Show Figures

Figure 1

23 pages, 2168 KB

Open AccessArticle

Interactive Functions of Palm-Up: Cross-Linguistic and Cross-Modal Insights from ASL, American English, LSFB and Belgian French

by Alysson Lepeut and Emily Shaw

Languages 2025, 10(9), 239; https://doi.org/10.3390/languages10090239 - 19 Sep 2025

Cited by 1 | Viewed by 482

Abstract

This study dives into the interactive functions of the palm-up across four language ecologies drawing on comparable corpus data from American Sign Language (ASL)-American English and French Belgian Sign Language (LSFB)-Belgian French. While researchers have examined palm-up in many different spoken and signed [...] Read more.

This study dives into the interactive functions of the palm-up across four language ecologies drawing on comparable corpus data from American Sign Language (ASL)-American English and French Belgian Sign Language (LSFB)-Belgian French. While researchers have examined palm-up in many different spoken and signed language contexts, they have primarily focused on the canonical forms and its epistemic variants. Work that directly compares palm-up across modalities and language ecologies remains scarce. This study addresses such gaps by documenting all instances of the palm approaching supination in four language ecologies to analyze its interactive functions cross-linguistically and cross-modally. Capitalizing on an existing typology of interactive gestures, palm-up annotations were conducted using ELAN on a total sample of 48 participants interacting face-to-face in dyads. Findings highlight the multifunctional nature of palm-up in terms of conversational dynamics with cross-modal differences in the specific interactive use of palm-up between spoken and signed language contexts. These findings underscore the versatility of the palm-up and reinforce its role in conversational dynamics as not merely supplementary but integral to human interaction. Full article

(This article belongs to the Special Issue Non-representational Gestures: Types, Use, and Functions)

► Show Figures

Figure 1

10 pages, 2364 KB

Open AccessProceeding Paper

AI-Powered Sign Language Detection Using YOLO-v11 for Communication Equality

by Ivana Lucia Kharisma, Irma Nurmalasari, Yuni Lestari, Salma Dela Septiani, Kamdan and Muchtar Ali Setyo Yudono

Eng. Proc. 2025, 107(1), 83; https://doi.org/10.3390/engproc2025107083 - 8 Sep 2025

Viewed by 1128

Abstract

Communication plays a vital role in conveying messages, expressing emotions, and sharing perceptions, becoming a fundamental aspect of human interaction with the environment. For individuals with hearing impairments, sign language serves as an essential communication tool, enabling interaction both within the deaf community [...] Read more.

Communication plays a vital role in conveying messages, expressing emotions, and sharing perceptions, becoming a fundamental aspect of human interaction with the environment. For individuals with hearing impairments, sign language serves as an essential communication tool, enabling interaction both within the deaf community and with non-deaf individuals. This study aims to bridge this misconception by developing an iconic language recognition system using the Deep Learning-based YOLO-v11 algorithm. YOLO-v11, a state-of-the-art object detection algorithm, is known for its speed, accuracy, and efficiency. The system uses image recognition to identify hand gestures in ASL and translates them into text or speech, facilitating inclusive communication. The accuracy of the training model is 94.67%, and the accuracy of the testing model is 93.02%, indicating that the model has excellent performance in recognizing sign language from the training and testing datasets. Additionally, the model is very reliable in recognizing the classes “Hello”, “I Love You”, “No”, and “Thank You” with a sensitivity close to or equal to 100%. This research contributes to advancing communication equality for individuals with hearing impairments, promoting inclusivity, and supporting their integration into society. Full article

(This article belongs to the Proceedings of The 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society)

► Show Figures

Figure 1

43 pages, 1528 KB

Open AccessArticle

Adaptive Sign Language Recognition for Deaf Users: Integrating Markov Chains with Niching Genetic Algorithm

by Muslem Al-Saidi, Áron Ballagi, Oday Ali Hassen and Saad M. Darwish

AI 2025, 6(8), 189; https://doi.org/10.3390/ai6080189 - 15 Aug 2025

Viewed by 1003

Abstract

Sign language recognition (SLR) plays a crucial role in bridging the communication gap between deaf individuals and the hearing population. However, achieving subject-independent SLR remains a significant challenge due to variations in signing styles, hand shapes, and movement patterns among users. Traditional Markov [...] Read more.

Sign language recognition (SLR) plays a crucial role in bridging the communication gap between deaf individuals and the hearing population. However, achieving subject-independent SLR remains a significant challenge due to variations in signing styles, hand shapes, and movement patterns among users. Traditional Markov Chain-based models struggle with generalizing across different signers, often leading to reduced recognition accuracy and increased uncertainty. These limitations arise from the inability of conventional models to effectively capture diverse gesture dynamics while maintaining robustness to inter-user variability. To address these challenges, this study proposes an adaptive SLR framework that integrates Markov Chains with a Niching Genetic Algorithm (NGA). The NGA optimizes the transition probabilities and structural parameters of the Markov Chain model, enabling it to learn diverse signing patterns while avoiding premature convergence to suboptimal solutions. In the proposed SLR framework, GA is employed to determine the optimal transition probabilities for the Markov Chain components operating across multiple signing contexts. To enhance the diversity of the initial population and improve the model’s adaptability to signer variations, a niche model is integrated using a Context-Based Clearing (CBC) technique. This approach mitigates premature convergence by promoting genetic diversity, ensuring that the population maintains a wide range of potential solutions. By minimizing gene association within chromosomes, the CBC technique enhances the model’s ability to learn diverse gesture transitions and movement dynamics across different users. This optimization process enables the Markov Chain to better generalize subject-independent sign language recognition, leading to improved classification accuracy, robustness against signer variability, and reduced misclassification rates. Experimental evaluations demonstrate a significant improvement in recognition performance, reduced error rates, and enhanced generalization across unseen signers, validating the effectiveness of the proposed approach. Full article

(This article belongs to the Topic Advances in Robot Vision Perception and Control Technology)

► Show Figures

Figure 1

23 pages, 5310 KB

Open AccessArticle

Greek Sign Language Detection with Artificial Intelligence

by Ioannis Panopoulos, Evangelos Topalis, Nikos Petrellis and Loukas Hadellis

Electronics 2025, 14(16), 3241; https://doi.org/10.3390/electronics14163241 - 15 Aug 2025

Viewed by 997

Abstract

Sign language serves as a vital way to communicate with individuals with hearing loss, deafness, or a speech disorder, yet accessibility remains limited, requiring technological advances to bridge the gap. This study presents the first real-time Greek Sign Language recognition system utilizing deep [...] Read more.

Sign language serves as a vital way to communicate with individuals with hearing loss, deafness, or a speech disorder, yet accessibility remains limited, requiring technological advances to bridge the gap. This study presents the first real-time Greek Sign Language recognition system utilizing deep learning and embedded computers. The recognition system is implemented using You Only Look Once (YOLO11X-seg), an advanced object detection model, which is embedded in a Python-based framework. The model is trained to recognize Greek Sign Language letters and an expandable set of specific words, i.e., the model is capable of distinguishing between static hand shapes (letters) and dynamic gestures (words). The most important advantage of the proposed system is its mobility and scalable processing power. The data are recorded using a mobile IP camera (based on Raspberry Pi 4) via a Motion-Joint Photographic Experts Group (MJPEG) Stream. The image is transmitted over a private ZeroTier network to a remote powerful computer capable of quickly processing large sign language models, employing Moonlight streaming technology. Smaller models can run on an embedded computer. The experimental evaluation shows excellent 99.07% recognition accuracy, while real-time operation is supported, with the image frames processed in 42.7 ms (23.4 frames/s), offering remote accessibility without requiring a direct connection to the processing unit. Full article

(This article belongs to the Special Issue Methods for Object Orientation and Tracking)

► Show Figures

Figure 1

20 pages, 5461 KB

Open AccessArticle

Design and Implementation of a 3D Korean Sign Language Learning System Using Pseudo-Hologram

by Naeun Kim, HaeYeong Choe, Sukwon Lee and Changgu Kang

Appl. Sci. 2025, 15(16), 8962; https://doi.org/10.3390/app15168962 - 14 Aug 2025

Viewed by 869

Abstract

Sign language is a three-dimensional (3D) visual language that conveys meaning through hand positions, shapes, and movements. Traditional sign language education methods, such as textbooks and videos, often fail to capture the spatial characteristics of sign language, leading to limitations in learning accuracy [...] Read more.

Sign language is a three-dimensional (3D) visual language that conveys meaning through hand positions, shapes, and movements. Traditional sign language education methods, such as textbooks and videos, often fail to capture the spatial characteristics of sign language, leading to limitations in learning accuracy and comprehension. To address this, we propose a 3D Korean Sign Language Learning System that leverages pseudo-hologram technology and hand gesture recognition using Leap Motion sensors. The proposed system provides learners with an immersive 3D learning experience by visualizing sign language gestures through pseudo-holographic displays. A Recurrent Neural Network (RNN) model, combined with Diffusion Convolutional Recurrent Neural Networks (DCRNNs) and ProbSparse Attention mechanisms, is used to recognize hand gestures from both hands in real-time. The system is implemented using a server–client architecture to ensure scalability and flexibility, allowing efficient updates to the gesture recognition model without modifying the client application. Experimental results show that the system enhances learners’ ability to accurately perform and comprehend sign language gestures. Additionally, a usability study demonstrated that 3D visualization significantly improves learning motivation and user engagement compared to traditional 2D learning methods. Full article

► Show Figures

Figure 1

18 pages, 261 KB

Open AccessArticle

Transhumanism, Religion, and Techno-Idolatry: A Derridean Response to Tirosh-Samuelson

by Michael G. Sherbert

Religions 2025, 16(8), 1028; https://doi.org/10.3390/rel16081028 - 9 Aug 2025

Viewed by 1230

Abstract

This paper critiques Hava Tirosh-Samuelson’s view of transhumanism as techno-idolatry by applying Derrida’s notion of the unconditional “to-come” and the generalized fetish. While acknowledging Tirosh-Samuelson’s stance that fetishes should not be reduced to idols, I argue that she fails to extend this understanding [...] Read more.

This paper critiques Hava Tirosh-Samuelson’s view of transhumanism as techno-idolatry by applying Derrida’s notion of the unconditional “to-come” and the generalized fetish. While acknowledging Tirosh-Samuelson’s stance that fetishes should not be reduced to idols, I argue that she fails to extend this understanding to transhumanism, instead depicting its fetishes as fixed idols. Drawing on Derrida’s notion of the generalized fetish, I argue that religious objects in Judaism (like the shofar or tefillin) function not as objects of worship but as material mediators of divine relation—tangible signs that carry symbolic, spiritual, and covenantal meaning while gesturing toward the divine without claiming to contain or represent it. Similarly, in transhumanism, brain-computer interfaces and AI act as fetishes that extend human capability and potential while remaining open to future reinterpretation. These fetishes, reflecting Derrida’s idea of the unconditional “to-come,” resist closure and allow for ongoing change and reinterpretation. By reducing transhumanism to mere idolatry, Tirosh-Samuelson overlooks how technological fetishes function as dynamic supplements, open to future possibilities and ongoing reinterpretation, which can be both beneficial and harmful to humanity now and in the future. Full article

(This article belongs to the Special Issue Religion and/of the Future)

20 pages, 16450 KB

Open AccessArticle

A Smart Textile-Based Tactile Sensing System for Multi-Channel Sign Language Recognition

by Keran Chen, Longnan Li, Qinyao Peng, Mengyuan He, Liyun Ma, Xinxin Li and Zhenyu Lu

Sensors 2025, 25(15), 4602; https://doi.org/10.3390/s25154602 - 25 Jul 2025

Viewed by 1031

Abstract

Sign language recognition plays a crucial role in enabling communication for deaf individuals, yet current methods face limitations such as sensitivity to lighting conditions, occlusions, and lack of adaptability in diverse environments. This study presents a wearable multi-channel tactile sensing system based on [...] Read more.

Sign language recognition plays a crucial role in enabling communication for deaf individuals, yet current methods face limitations such as sensitivity to lighting conditions, occlusions, and lack of adaptability in diverse environments. This study presents a wearable multi-channel tactile sensing system based on smart textiles, designed to capture subtle wrist and finger motions for static sign language recognition. The system leverages triboelectric yarns sewn into gloves and sleeves to construct a skin-conformal tactile sensor array, capable of detecting biomechanical interactions through contact and deformation. Unlike vision-based approaches, the proposed sensor platform operates independently of environmental lighting or occlusions, offering reliable performance in diverse conditions. Experimental validation on American Sign Language letter gestures demonstrates that the proposed system achieves high signal clarity after customized filtering, leading to a classification accuracy of 94.66%. Experimental results show effective recognition of complex gestures, highlighting the system’s potential for broader applications in human-computer interaction. Full article

(This article belongs to the Special Issue Advanced Tactile Sensors: Design and Applications)

► Show Figures

Figure 1

20 pages, 2786 KB

Open AccessArticle

Inverse Kinematics-Augmented Sign Language: A Simulation-Based Framework for Scalable Deep Gesture Recognition

by Binghao Wang, Lei Jing and Xiang Li

Algorithms 2025, 18(8), 463; https://doi.org/10.3390/a18080463 - 24 Jul 2025

Viewed by 646

Abstract

In this work, we introduce IK-AUG, a unified algorithmic framework for kinematics-driven data augmentation tailored to sign language recognition (SLR). Departing from traditional augmentation techniques that operate at the pixel or feature level, our method integrates inverse kinematics (IK) and virtual simulation to [...] Read more.

In this work, we introduce IK-AUG, a unified algorithmic framework for kinematics-driven data augmentation tailored to sign language recognition (SLR). Departing from traditional augmentation techniques that operate at the pixel or feature level, our method integrates inverse kinematics (IK) and virtual simulation to synthesize anatomically valid gesture sequences within a structured 3D environment. The proposed system begins with sparse 3D keypoints extracted via a pose estimator and projects them into a virtual coordinate space. A differentiable IK solver based on forward-and-backward constrained optimization is then employed to reconstruct biomechanically plausible joint trajectories. To emulate natural signer variability and enhance data richness, we define a set of parametric perturbation operators spanning spatial displacement, depth modulation, and solver sensitivity control. These operators are embedded into a generative loop that transforms each original gesture sample into a diverse sequence cluster, forming a high-fidelity augmentation corpus. We benchmark our method across five deep sequence models (CNN3D, TCN, Transformer, Informer, and Sparse Transformer) and observe consistent improvements in accuracy and convergence. Notably, Informer achieves 94.1% validation accuracy with IK-AUG enhanced training, underscoring the framework’s efficacy. These results suggest that algorithmic augmentation via kinematic modeling offers a scalable, annotation free pathway for improving SLR systems and lays the foundation for future integration with multi-sensor inputs in hybrid recognition pipelines. Full article

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

► Show Figures

Figure 1

19 pages, 709 KB

Open AccessArticle

Fusion of Multimodal Spatio-Temporal Features and 3D Deformable Convolution Based on Sign Language Recognition in Sensor Networks

by Qian Zhou, Hui Li, Weizhi Meng, Hua Dai, Tianyu Zhou and Guineng Zheng

Sensors 2025, 25(14), 4378; https://doi.org/10.3390/s25144378 - 13 Jul 2025

Viewed by 1008

Abstract

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the [...] Read more.

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the intricate task of precise and expedient SLR from raw videos, this study introduces a novel deep learning approach by devising a multimodal framework for SLR. Specifically, feature extraction models are built based on two modalities: skeleton and RGB images. In this paper, we firstly propose a Multi-Stream Spatio-Temporal Graph Convolutional Network (MSGCN) that relies on three modules: a decoupling graph convolutional network, a self-emphasizing temporal convolutional network, and a spatio-temporal joint attention module. These modules are combined to capture the spatio-temporal information in multi-stream skeleton features. Secondly, we propose a 3D ResNet model based on deformable convolution (D-ResNet) to model complex spatial and temporal sequences in the original raw images. Finally, a gating mechanism-based Multi-Stream Fusion Module (MFM) is employed to merge the results of the two modalities. Extensive experiments are conducted on the public datasets AUTSL and WLASL, achieving competitive results compared to state-of-the-art systems. Full article

(This article belongs to the Special Issue Intelligent Sensing and Artificial Intelligence for Image Processing)

► Show Figures

Figure 1

Search Results (201)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (201)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI