Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (14)

Search Parameters:
Keywords = American sign language alphabet

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 7712 KB  
Article
CT-Net: A Hybrid ConvNeXt–Transformer Approach for ASL Alphabet Classification
by Zhuofan Yang, Houjin Lu and Samaneh Shamshiri
Appl. Sci. 2026, 16(10), 5168; https://doi.org/10.3390/app16105168 - 21 May 2026
Viewed by 393
Abstract
Recognition of the American Sign Language (ASL) alphabet is of utmost importance in bridging the communication gap between the hearing-impaired and the hearing. However, robust classification remains difficult because some hand gestures are morphologically very similar. To address this problem, this study presents [...] Read more.
Recognition of the American Sign Language (ASL) alphabet is of utmost importance in bridging the communication gap between the hearing-impaired and the hearing. However, robust classification remains difficult because some hand gestures are morphologically very similar. To address this problem, this study presents CT-Net, a hybrid deep learning architecture that integrates ConvNeXt-Tiny with a lightweight Transformer encoder. CT-Net combines convolutional feature extraction and self-attention mechanisms, which enable it to capture fine-grained local patterns and long-range spatial dependencies effectively. The proposed model was extensively compared with various architectures including traditional CNNs, Transformer-based models, hybrid machine-learning approaches and recent lightweight hybrid networks. The experimental results show that CT-Net achieved the best overall performance with a peak accuracy of 95.67% on the enhanced ASL dataset. Ablation studies demonstrate the effectiveness of our design choice. CT-Net achieves a strong trade-off between recognition accuracy and computational efficiency with an inference rate of 163.55 Frames Per Second (FPS). These findings highlight the potential of hybrid frameworks as a powerful tool for fine-grained gesture recognition tasks. Full article
Show Figures

Figure 1

21 pages, 5202 KB  
Article
Real-Time American Sign Language Interpretation Using Deep Learning and Keypoint Tracking
by Bader Alsharif, Easa Alalwany, Ali Ibrahim, Imad Mahgoub and Mohammad Ilyas
Sensors 2025, 25(7), 2138; https://doi.org/10.3390/s25072138 - 28 Mar 2025
Cited by 25 | Viewed by 17875
Abstract
Communication barriers pose significant challenges for the Deaf and Hard-of-Hearing (DHH) community, limiting their access to essential services, social interactions, and professional opportunities. To bridge this gap, assistive technologies leveraging artificial intelligence (AI) and deep learning have gained prominence. This study presents a [...] Read more.
Communication barriers pose significant challenges for the Deaf and Hard-of-Hearing (DHH) community, limiting their access to essential services, social interactions, and professional opportunities. To bridge this gap, assistive technologies leveraging artificial intelligence (AI) and deep learning have gained prominence. This study presents a real-time American Sign Language (ASL) interpretation system that integrates deep learning with keypoint tracking to enhance accessibility and foster inclusivity. By combining the YOLOv11 model for gesture recognition with MediaPipe for precise hand tracking, the system achieves high accuracy in identifying ASL alphabet letters in real time. The proposed approach addresses challenges such as gesture ambiguity, environmental variations, and computational efficiency. Additionally, this system enables users to spell out names and locations, further improving its practical applications. Experimental results demonstrate that the model attains a mean Average Precision (mAP@0.5) of 98.2%, with an inference speed optimized for real-world deployment. This research underscores the critical role of AI-driven assistive technologies in empowering the DHH community by enabling seamless communication and interaction. Full article
(This article belongs to the Special Issue Sensor Systems for Gesture Recognition (3rd Edition))
Show Figures

Figure 1

7 pages, 852 KB  
Proceeding Paper
Improving Hand Pose Recognition Using Localization and Zoom Normalizations over MediaPipe Landmarks
by Miguel Ángel Remiro, Manuel Gil-Martín and Rubén San-Segundo
Eng. Proc. 2023, 58(1), 69; https://doi.org/10.3390/ecsa-10-16215 - 15 Nov 2023
Cited by 9 | Viewed by 4666
Abstract
Hand pose recognition presents significant challenges that need to be addressed, such as varying lighting conditions or complex backgrounds, which can hinder accurate and robust hand pose estimation. This can be mitigated by employing MediaPipe to facilitate the efficient extraction of representative landmarks [...] Read more.
Hand pose recognition presents significant challenges that need to be addressed, such as varying lighting conditions or complex backgrounds, which can hinder accurate and robust hand pose estimation. This can be mitigated by employing MediaPipe to facilitate the efficient extraction of representative landmarks from static images combined with the use of Convolutional Neural Networks. Extracting these landmarks from the hands mitigates the impact of lighting variability or the presence of complex backgrounds. However, the variability of the location and size of the hand is still not addressed by this process. Therefore, the use of processing modules to normalize these points regarding the location of the wrist and the zoom of the hands can significantly mitigate the effects of these variabilities. In all the experiments performed in this work based on American Sign Language alphabet datasets of 870, 27,000, and 87,000 images, the application of the proposed normalizations has resulted in significant improvements in the model performance in a resource-limited scenario. Particularly, under conditions of high variability, applying both normalizations resulted in a performance increment of 45.08%, increasing the accuracy from 43.94 ± 0.64% to 89.02 ± 0.40%. Full article
Show Figures

Figure 1

20 pages, 1550 KB  
Article
Deep Learning Technology to Recognize American Sign Language Alphabet
by Bader Alsharif, Ali Salem Altaher, Ahmed Altaher, Mohammad Ilyas and Easa Alalwany
Sensors 2023, 23(18), 7970; https://doi.org/10.3390/s23187970 - 19 Sep 2023
Cited by 56 | Viewed by 13813
Abstract
Historically, individuals with hearing impairments have faced neglect, lacking the necessary tools to facilitate effective communication. However, advancements in modern technology have paved the way for the development of various tools and software aimed at improving the quality of life for hearing-disabled individuals. [...] Read more.
Historically, individuals with hearing impairments have faced neglect, lacking the necessary tools to facilitate effective communication. However, advancements in modern technology have paved the way for the development of various tools and software aimed at improving the quality of life for hearing-disabled individuals. This research paper presents a comprehensive study employing five distinct deep learning models to recognize hand gestures for the American Sign Language (ASL) alphabet. The primary objective of this study was to leverage contemporary technology to bridge the communication gap between hearing-impaired individuals and individuals with no hearing impairment. The models utilized in this research include AlexNet, ConvNeXt, EfficientNet, ResNet-50, and VisionTransformer were trained and tested using an extensive dataset comprising over 87,000 images of the ASL alphabet hand gestures. Numerous experiments were conducted, involving modifications to the architectural design parameters of the models to obtain maximum recognition accuracy. The experimental results of our study revealed that ResNet-50 achieved an exceptional accuracy rate of 99.98%, the highest among all models. EfficientNet attained an accuracy rate of 99.95%, ConvNeXt achieved 99.51% accuracy, AlexNet attained 99.50% accuracy, while VisionTransformer yielded the lowest accuracy of 88.59%. Full article
(This article belongs to the Collection Machine Learning and AI for Sensors)
Show Figures

Figure 1

15 pages, 4911 KB  
Article
American Sign Language Alphabet Recognition Using Inertial Motion Capture System with Deep Learning
by Yutong Gu, Sherrine, Weiyi Wei, Xinya Li, Jianan Yuan and Masahiro Todoh
Inventions 2022, 7(4), 112; https://doi.org/10.3390/inventions7040112 - 1 Dec 2022
Cited by 13 | Viewed by 11942
Abstract
Sign language is designed as a natural communication method for the deaf community to convey messages and connect with society. In American sign language, twenty-six special sign gestures from the alphabet are used for the fingerspelling of proper words. The purpose of this [...] Read more.
Sign language is designed as a natural communication method for the deaf community to convey messages and connect with society. In American sign language, twenty-six special sign gestures from the alphabet are used for the fingerspelling of proper words. The purpose of this research is to classify the hand gestures in the alphabet and recognize a sequence of gestures in the fingerspelling using an inertial hand motion capture system. In this work, time and time-frequency domain features and angle-based features are extracted from the raw data for classification with convolutional neural network-based classifiers. In fingerspelling recognition, we explore two kinds of models: connectionist temporal classification and encoder-decoder structured sequence recognition model. The study reveals that the classification model achieves an average accuracy of 74.8% for dynamic ASL gestures considering user independence. Moreover, the proposed two sequence recognition models achieve 55.1%, 93.4% accuracy in word-level evaluation, and 86.5%, 97.9% in the letter-level evaluation of fingerspelling. The proposed method has the potential to recognize more hand gestures of sign language with highly reliable inertial data from the device. Full article
(This article belongs to the Collection Feature Innovation Papers)
Show Figures

Figure 1

17 pages, 2642 KB  
Article
Ensemble Learning of Multiple Deep CNNs Using Accuracy-Based Weighted Voting for ASL Recognition
by Ying Ma, Tianpei Xu, Seokbung Han and Kangchul Kim
Appl. Sci. 2022, 12(22), 11766; https://doi.org/10.3390/app122211766 - 19 Nov 2022
Cited by 17 | Viewed by 3888
Abstract
More than four million people worldwide suffer from hearing loss. Recently, new CNNs and deep ensemble-learning technologies have brought promising opportunities to the image-recognition field, so many studies aiming to recognize American Sign Language (ASL) have been conducted to help these people express [...] Read more.
More than four million people worldwide suffer from hearing loss. Recently, new CNNs and deep ensemble-learning technologies have brought promising opportunities to the image-recognition field, so many studies aiming to recognize American Sign Language (ASL) have been conducted to help these people express their thoughts. This paper proposes an ASL Recognition System using Multiple deep CNNs and accuracy-based weighted voting (ARS-MA) composed of three parts: data preprocessing, feature extraction, and classification. Ensemble learning using multiple deep CNNs based on LeNet, AlexNet, VGGNet, GoogleNet, and ResNet were set up for the feature extraction and their results were used to create three new datasets for classification. The proposed accuracy-based weighted voting (AWV) algorithm and four existing machine algorithms were compared for the classification. Two parameters, α and λ, are introduced to increase the accuracy and reduce the testing time in AWV. The experimental results show that the proposed ARS-MA achieved 98.83% and 98.79% accuracy on the ASL Alphabet and ASLA datasets, respectively. Full article
(This article belongs to the Special Issue Advances in Applied Signal and Image Processing Technology)
Show Figures

Figure 1

18 pages, 8483 KB  
Article
Hand Gesture Recognition with Symmetric Pattern under Diverse Illuminated Conditions Using Artificial Neural Network
by Muhammad Haroon, Saud Altaf, Shafiq Ahmad, Mazen Zaindin, Shamsul Huda and Sofia Iqbal
Symmetry 2022, 14(10), 2045; https://doi.org/10.3390/sym14102045 - 30 Sep 2022
Cited by 13 | Viewed by 4586
Abstract
This paper investigated the effects of variant lighting conditions on the recognition process. A framework is proposed to improve the performance of gesture recognition under variant illumination using the luminosity method. To prove the concept, a workable testbed has been developed in the [...] Read more.
This paper investigated the effects of variant lighting conditions on the recognition process. A framework is proposed to improve the performance of gesture recognition under variant illumination using the luminosity method. To prove the concept, a workable testbed has been developed in the laboratory by using a Microsoft Kinect sensor to capture the depth images for the purpose of acquiring diverse resolution data. For this, a case study was formulated to achieve an improved accuracy rate in gesture recognition under diverse illuminated conditions. For data preparation, American Sign Language (ASL) was used to create a dataset of all twenty-six signs, evaluated in real-time under diverse lighting conditions. The proposed method uses a set of symmetric patterns as a feature set in order to identify human hands and recognize gestures extracted through hand perimeter feature-extraction methods. A Scale-Invariant Feature Transform (SIFT) is used in the identification of significant key points of ASL-based images with their relevant features. Finally, an Artificial Neural Network (ANN) trained on symmetric patterns under different lighting environments was used to classify hand gestures utilizing selected features for validation. The experimental results showed that the proposed system performed well in diverse lighting effects with multiple pixel sizes. A total aggregate 97.3% recognition accuracy rate is achieved across 26 alphabet datasets with only a 2.7% error rate, which shows the overall efficiency of the ANN architecture in terms of processing time. Full article
(This article belongs to the Special Issue Computer Vision, Pattern Recognition, Machine Learning, and Symmetry)
Show Figures

Figure 1

19 pages, 1961 KB  
Article
American Sign Language Alphabet Recognition by Extracting Feature from Hand Pose Estimation
by Jungpil Shin, Akitaka Matsuoka, Md. Al Mehedi Hasan and Azmain Yakin Srizon
Sensors 2021, 21(17), 5856; https://doi.org/10.3390/s21175856 - 31 Aug 2021
Cited by 114 | Viewed by 14948
Abstract
Sign language is designed to assist the deaf and hard of hearing community to convey messages and connect with society. Sign language recognition has been an important domain of research for a long time. Previously, sensor-based approaches have obtained higher accuracy than vision-based [...] Read more.
Sign language is designed to assist the deaf and hard of hearing community to convey messages and connect with society. Sign language recognition has been an important domain of research for a long time. Previously, sensor-based approaches have obtained higher accuracy than vision-based approaches. Due to the cost-effectiveness of vision-based approaches, researchers have been conducted here also despite the accuracy drop. The purpose of this research is to recognize American sign characters using hand images obtained from a web camera. In this work, the media-pipe hands algorithm was used for estimating hand joints from RGB images of hands obtained from a web camera and two types of features were generated from the estimated coordinates of the joints obtained for classification: one is the distances between the joint points and the other one is the angles between vectors and 3D axes. The classifiers utilized to classify the characters were support vector machine (SVM) and light gradient boosting machine (GBM). Three character datasets were used for recognition: the ASL Alphabet dataset, the Massey dataset, and the finger spelling A dataset. The results obtained were 99.39% for the Massey dataset, 87.60% for the ASL Alphabet dataset, and 98.45% for Finger Spelling A dataset. The proposed design for automatic American sign language recognition is cost-effective, computationally inexpensive, does not require any special sensors or devices, and has outperformed previous studies. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

24 pages, 3755 KB  
Article
Optimization of Convolutional Neural Networks Architectures Using PSO for Sign Language Recognition
by Jonathan Fregoso, Claudia I. Gonzalez and Gabriela E. Martinez
Axioms 2021, 10(3), 139; https://doi.org/10.3390/axioms10030139 - 29 Jun 2021
Cited by 53 | Viewed by 7631
Abstract
This paper presents an approach to design convolutional neural network architectures, using the particle swarm optimization algorithm. The adjustment of the hyper-parameters and finding the optimal network architecture of convolutional neural networks represents an important challenge. Network performance and achieving efficient learning models [...] Read more.
This paper presents an approach to design convolutional neural network architectures, using the particle swarm optimization algorithm. The adjustment of the hyper-parameters and finding the optimal network architecture of convolutional neural networks represents an important challenge. Network performance and achieving efficient learning models for a particular problem depends on setting hyper-parameter values and this implies exploring a huge and complex search space. The use of heuristic-based searches supports these types of problems; therefore, the main contribution of this research work is to apply the PSO algorithm to find the optimal parameters of the convolutional neural networks which include the number of convolutional layers, the filter size used in the convolutional process, the number of convolutional filters, and the batch size. This work describes two optimization approaches; the first, the parameters obtained by PSO are kept under the same conditions in each convolutional layer, and the objective function evaluated by PSO is given by the classification rate; in the second, the PSO generates different parameters per layer, and the objective function is composed of the recognition rate in conjunction with the Akaike information criterion, the latter helps to find the best network performance but with the minimum parameters. The optimized architectures are implemented in three study cases of sign language databases, in which are included the Mexican Sign Language alphabet, the American Sign Language MNIST, and the American Sign Language alphabet. According to the results, the proposed methodologies achieved favorable results with a recognition rate higher than 99%, showing competitive results compared to other state-of-the-art approaches. Full article
(This article belongs to the Special Issue Various Deep Learning Algorithms in Computational Intelligence)
Show Figures

Figure 1

18 pages, 4958 KB  
Article
Spelling Correction Real-Time American Sign Language Alphabet Translation System Based on YOLO Network and LSTM
by Miguel Rivera-Acosta, Juan Manuel Ruiz-Varela, Susana Ortega-Cisneros, Jorge Rivera, Ramón Parra-Michel and Pedro Mejia-Alvarez
Electronics 2021, 10(9), 1035; https://doi.org/10.3390/electronics10091035 - 27 Apr 2021
Cited by 34 | Viewed by 6529
Abstract
In this paper, we present a novel approach that aims to solve one of the main challenges in hand gesture recognition tasks in static images, to compensate for the accuracy lost when trained models are used to interpret completely unseen data. The model [...] Read more.
In this paper, we present a novel approach that aims to solve one of the main challenges in hand gesture recognition tasks in static images, to compensate for the accuracy lost when trained models are used to interpret completely unseen data. The model presented here consists of two main data-processing stages. A deep neural network (DNN) for performing handshape segmentation and classification is used in which multiple architectures and input image sizes were tested and compared to derive the best model in terms of accuracy and processing time. For the experiments presented in this work, the DNN models were trained with 24,000 images of 24 signs from the American Sign Language alphabet and fine-tuned with 5200 images of 26 generated signs. The system was real-time tested with a community of 10 persons, yielding a mean average precision and processing rate of 81.74% and 61.35 frames-per-second, respectively. As a second data-processing stage, a bidirectional long short-term memory neural network was implemented and analyzed for adding spelling correction capability to our system, which scored a training accuracy of 98.07% with a dictionary of 370 words, thus, increasing the robustness in completely unseen data, as shown in our experiments. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision and Pattern Recognition)
Show Figures

Figure 1

22 pages, 5952 KB  
Article
Development of Sign Language Motion Recognition System for Hearing-Impaired People Using Electromyography Signal
by Shigeyuki Tateno, Hongbin Liu and Junhong Ou
Sensors 2020, 20(20), 5807; https://doi.org/10.3390/s20205807 - 14 Oct 2020
Cited by 44 | Viewed by 6622
Abstract
Sign languages are developed around the world for hearing-impaired people to communicate with others who understand them. Different grammar and alphabets limit the usage of sign languages between different sign language users. Furthermore, training is required for hearing-intact people to communicate with them. [...] Read more.
Sign languages are developed around the world for hearing-impaired people to communicate with others who understand them. Different grammar and alphabets limit the usage of sign languages between different sign language users. Furthermore, training is required for hearing-intact people to communicate with them. Therefore, in this paper, a real-time motion recognition system based on an electromyography signal is proposed for recognizing actual American Sign Language (ASL) hand motions for helping hearing-impaired people communicate with others and training normal people to understand the sign languages. A bilinear model is applied to deal with the on electromyography (EMG) data for decreasing the individual difference among different people. A long short-term memory neural network is used in this paper as the classifier. Twenty sign language motions in the ASL library are selected for recognition in order to increase the practicability of the system. The results indicate that this system can recognize these twenty motions with high accuracy among twenty participants. Therefore, this system has the potential to be widely applied to help hearing-impaired people for daily communication and normal people to understand the sign languages. Full article
(This article belongs to the Collection Sensors for Gait, Human Movement Analysis, and Health Monitoring)
Show Figures

Figure 1

15 pages, 1018 KB  
Article
Multi-Modal Deep Hand Sign Language Recognition in Still Images Using Restricted Boltzmann Machine
by Razieh Rastgoo, Kourosh Kiani and Sergio Escalera
Entropy 2018, 20(11), 809; https://doi.org/10.3390/e20110809 - 23 Oct 2018
Cited by 96 | Viewed by 9296
Abstract
In this paper, a deep learning approach, Restricted Boltzmann Machine (RBM), is used to perform automatic hand sign language recognition from visual data. We evaluate how RBM, as a deep generative model, is capable of generating the distribution of the input data for [...] Read more.
In this paper, a deep learning approach, Restricted Boltzmann Machine (RBM), is used to perform automatic hand sign language recognition from visual data. We evaluate how RBM, as a deep generative model, is capable of generating the distribution of the input data for an enhanced recognition of unseen data. Two modalities, RGB and Depth, are considered in the model input in three forms: original image, cropped image, and noisy cropped image. Five crops of the input image are used and the hand of these cropped images are detected using Convolutional Neural Network (CNN). After that, three types of the detected hand images are generated for each modality and input to RBMs. The outputs of the RBMs for two modalities are fused in another RBM in order to recognize the output sign label of the input image. The proposed multi-modal model is trained on all and part of the American alphabet and digits of four publicly available datasets. We also evaluate the robustness of the proposal against noise. Experimental results show that the proposed multi-modal model, using crops and the RBM fusing methodology, achieves state-of-the-art results on Massey University Gesture Dataset 2012, American Sign Language (ASL). and Fingerspelling Dataset from the University of Surrey’s Center for Vision, Speech and Signal Processing, NYU, and ASL Fingerspelling A datasets. Full article
(This article belongs to the Special Issue Statistical Machine Learning for Human Behaviour Analysis)
Show Figures

Figure 1

17 pages, 5078 KB  
Article
American Sign Language Alphabet Recognition Using a Neuromorphic Sensor and an Artificial Neural Network
by Miguel Rivera-Acosta, Susana Ortega-Cisneros, Jorge Rivera and Federico Sandoval-Ibarra
Sensors 2017, 17(10), 2176; https://doi.org/10.3390/s17102176 - 22 Sep 2017
Cited by 35 | Viewed by 12053
Abstract
This paper reports the design and analysis of an American Sign Language (ASL) alphabet translation system implemented in hardware using a Field-Programmable Gate Array. The system process consists of three stages, the first being the communication with the neuromorphic camera (also called Dynamic [...] Read more.
This paper reports the design and analysis of an American Sign Language (ASL) alphabet translation system implemented in hardware using a Field-Programmable Gate Array. The system process consists of three stages, the first being the communication with the neuromorphic camera (also called Dynamic Vision Sensor, DVS) sensor using the Universal Serial Bus protocol. The feature extraction of the events generated by the DVS is the second part of the process, consisting of a presentation of the digital image processing algorithms developed in software, which aim to reduce redundant information and prepare the data for the third stage. The last stage of the system process is the classification of the ASL alphabet, achieved with a single artificial neural network implemented in digital hardware for higher speed. The overall result is the development of a classification system using the ASL signs contour, fully implemented in a reconfigurable device. The experimental results consist of a comparative analysis of the recognition rate among the alphabet signs using the neuromorphic camera in order to prove the proper operation of the digital image processing algorithms. In the experiments performed with 720 samples of 24 signs, a recognition accuracy of 79.58% was obtained. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

26 pages, 1468 KB  
Article
Towards Real-Time and Rotation-Invariant American Sign Language Alphabet Recognition Using a Range Camera
by Hervé Lahamy and Derek D. Lichti
Sensors 2012, 12(11), 14416-14441; https://doi.org/10.3390/s121114416 - 29 Oct 2012
Cited by 25 | Viewed by 7027
Abstract
The automatic interpretation of human gestures can be used for a natural interaction with computers while getting rid of mechanical devices such as keyboards and mice. In order to achieve this objective, the recognition of hand postures has been studied for many years. [...] Read more.
The automatic interpretation of human gestures can be used for a natural interaction with computers while getting rid of mechanical devices such as keyboards and mice. In order to achieve this objective, the recognition of hand postures has been studied for many years. However, most of the literature in this area has considered 2D images which cannot provide a full description of the hand gestures. In addition, a rotation-invariant identification remains an unsolved problem, even with the use of 2D images. The objective of the current study was to design a rotation-invariant recognition process while using a 3D signature for classifying hand postures. A heuristic and voxel-based signature has been designed and implemented. The tracking of the hand motion is achieved with the Kalman filter. A unique training image per posture is used in the supervised classification. The designed recognition process, the tracking procedure and the segmentation algorithm have been successfully evaluated. This study has demonstrated the efficiency of the proposed rotation invariant 3D hand posture signature which leads to 93.88% recognition rate after testing 14,732 samples of 12 postures taken from the alphabet of the American Sign Language. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Back to TopTop