1. Introduction
As an indispensable means of communication for people with hearing and speech impairments, sign language has a profound impact on all aspects of their lives [
1]. According to the World Health Organization’s Global Report on Health Equity for People with Disabilities in 2022, the total number of people with disabilities in the world’s population of 7.7 billion is 1.3 billion, with hearing and speech impaired people accounting for a significant proportion of the total, highlighting the urgent need for universal access to sign language and assistive technology.
With the leap of Artificial Intelligence (AI) and sensor technology, the integration of wearable devices and human-computer interaction has become the focus of scientific research, which not only broadens the application fields of VR [
2] and health monitoring [
3], but also shows great prospects in the field of sign language recognition [
4]. This technology realizes sign language recognition and conversion by accurately capturing hand movements, opening up a new communication and learning channel for people with disabilities.
In the field of sign language recognition using wearable devices [
5,
6], researchers have made numerous invaluable contributions.
Table 1 summarizes eight different methods for static sign language recognition.
From
Table 1, it is evident that gesture recognition technology in the field of machine vision has garnered significant attention due to its convenience, high efficiency, and real-time processing capabilities. Medhanit Y. Alemu’s team [
7] employed artificial skin and a Multi-Layer Perceptron (MLP) method, achieving an accuracy of 91.13%. However, this approach is associated with high manufacturing costs and complex data processing. Similarly, Chaithanya Kumar Mammadli’s team [
8] utilized Inertial Measurement Unit (IMU) data gloves with MLP, attaining a recognition accuracy of 92%. While these results are promising, they also highlight challenges related to data handling and equipment expenses.
To address the impact of environmental factors, such as lighting and skin color variations, Paolo Sernani’s team [
9] explored surface electromyography (sEMG) signals and employed a Long Short-Term Memory (LSTM) network, achieving a recognition rate of 97%. Despite the advantages of sEMG, its complex data processing may hinder its applicability in real-time scenarios.
In contrast to the aforementioned methods, Liufeng Fan and his research team [
10] leveraged flexible bending sensors in combination with Convolutional Neural Networks (CNN) and Bidirectional LSTM (BiLSTM), achieving a recognition accuracy of 98.21%. This technique has gained favor among researchers due to its low cost and straightforward data processing.
Furthermore, scholars such as Yinlong Zhu [
11] and Weixin Deng [
12] attained recognition accuracies of 98.5% and 99.2%, respectively, using BP Neural Networks and Support Vector Machines (SVM). Jungpil Shin’s team [
13] utilized a camera with SVM to achieve a recognition accuracy of 99.39%. Meanwhile, C.K.M. LEE’s team [
14] employed Leap Motion controllers integrated with Recurrent Neural Networks (RNN), attaining a remarkable recognition rate of 99.44%. However, this method faces limitations in cross-platform applications.
These studies underscore the substantial potential of various technologies in gesture recognition, particularly highlighting the advantages of portable devices in terms of flexibility and real-time processing, which have made them highly regarded in the field of gesture recognition.
Except for this research on recognition accuracy, all of the above methods have not explored in detail the problem of reduced recognition accuracy directly caused by the difficulty of ensuring that each movement achieves the precision of a standard gesture when a person with speech impairment performs a sign language gesture while wearing a wearable device in a real-world application.
On this basis, this paper proposes a static sign language recognition method based on self-attention enhancement, which highlights the key features of sign language gesture classification through the weight function and then combines the self-attention mechanism to assign higher attention to the key features and use a convolutional neural network to extract features and classification. In this paper, the proposed method effectively reduces the low recognition accuracy caused by the unstandardized sign language movement of beginners and then improves the accuracy and robustness of the system.
Contributions
This paper makes the following contributions to the field of static gesture recognition:
Developed a data glove integrated with 5 flex sensors and a 6-axis gyroscope for gesture data collection.
Proposed a gesture recognition method capable of identifying 36 static gestures, with two main improvements:
Created a dataset of 36 static gestures based on the data collected using the developed glove.
Table 2 presents a comprehensive comparison of the contributions made by the proposed method in this paper against various state-of-the-art approaches introduced by other scholars.
The remainder of this paper is structured as follows:
Section 2 explains the overall system architecture, the data collection process, and the composition of the dataset.
Section 3 provides a detailed description of the proposed methodology.
Section 4 presents and analyzes the experimental results, while
Section 5 offers the conclusions of this study.
5. Conclusions
In this paper, a static sign language recognition method based on self-attention mechanism enhancement is proposed to address the problems of poor noise immunity and low robustness of wearable devices under different usage objects. The method increases the sample diversity through a noise injection module and designs a self-attention enhancement module, which improves the weight of key features in the feature extraction process and enhances the robustness of gesture recognition.
The experimental results show that the proposed method achieves an accuracy of 99.52% under the standard gesture test experiment set of the new dataset customized with reference to the ASL dataset, and the accuracy rate has been improved to varying degrees in comparison with mainstream methods at home and abroad. In the robustness experiments, even under the random angular bias conditions of ±(0°–9°] and ±(9°–18°], the average recognition accuracies can still be maintained at 98.63% and 86.33%, respectively, and the accuracies are still improved compared with the latest mainstream methods, which show excellent noise resistance and robustness.
However, current research mainly focuses on static gesture recognition, while gesture interactions in real applications are usually dynamic and continuous. Therefore, future work will focus on dynamic gesture recognition and work on designing more efficient and accurate real-time interaction models.