1. Introduction
As intelligent devices have been integrated into thousands of households, the traditional means of human-computer interaction can no longer meet the growing needs of users. The emergence of a series of new technologies, such as face recognition, gesture recognition, and body posture recognition, has completely updated the way of human-computer interaction. As one of the core technologies of new human-computer interaction, gesture recognition has become an active research topic [
1]. Gestures are different from traditional human-computer interaction methods (such as mouse, keyboard, touch screen), because they are simple, intuitive, easy to learn, and can express ideas naturally, which can bring users a more friendly experience. Gesture recognition technology has been widely used in the smart home, sign language recognition, robot remote control, virtual reality devices [
2]. There are nearly 70 million deaf and hearing-impaired people in the world. For them, sign language is a key tool for equal communication and contact with the world. Especially in specific scenarios, such as the legal needs of people with hearing impairment, cultural communication, they need to communicate with people who are not familiar with sign language. Although the intelligence of interactive technology and the emergence of sign language interpreters have solved these problems, the existing gesture recognition technology has a lack of universality, such as high price and inconvenience to carry.
The existing gesture recognition systems can be divided into three categories: computer vision, wearable sensors, and radio communication technology. Among them, the computer vision method captures the video stream of gesture movements through optical cameras, uses image processing technology to extract gesture features, and uses strict mathematical models for pattern recognition to obtain reliable recognition results. It is a relatively mature method at present, and many representative products have been born. For example, Leap’s LeapMotion [
3] and Microsoft’s Kinect [
4], but because of relying on optical high-definition cameras, this method does not work in low light and water fog environment, and the recognition range of gestures is limited by the shooting range of the camera, and cannot be used in some scenes where users’ privacy needs to be protected. However, gesture recognition systems based on wearable sensors generally need to be equipped with special sensors, such as accelerometer [
5], gyroscope [
6], air pressure sensor [
7], surface electromyography sensor [
8], and so on. Through these sensors, we can detect the position and speed of the hand, and realize the modeling of gestures, thus perceiving different gestures. Among them, the more famous are the virtual glove CyberGlove [
9] and the wearable device Baidu Eye [
10]. CyberGlove uses 22 sensors integrated on the glove to sense the joint motion of the human hand and converts the motion of the hand and finger into digital data. Baidu Eye recognizes the movements of the fingers to identify objects in the air. Although the gesture recognition technology using special sensors can directly obtain fine-grained hand and finger motion data and achieve higher recognition accuracy, due to the need for users to wear additional devices, limited to the sensing distance of the sensor and expensive deployment and maintenance costs, it cannot be used on a large scale. The gesture recognition scheme using radio communication technology mainly uses the reflection of hand gestures of radio signals for gesture recognition. According to different radio equipment and detection indicators, there are several gesture recognition methods. WiSee [
11] is a gesture recognition system based on Universal Software Radio Peripheral (USRP). By analyzing the Doppler effect of WiFi signals, it realizes the recognition of nine commonly used gestures in an indoor environment. Wiz [
12] uses the Frequency Modulated Continuous Wave (FMCW) technology, analyzes the time from the transmitter to the receiver according to the signal affected by gestures, and realizes user action recognition and three-dimensional gesture pointing. The radio communication technology does not require the human body to wear additional equipment, the environmental requirements are low and have high recognition accuracy, but the transmission of these radio signals requires special equipment with high cost, which is not universal and cannot be popularized in daily life.
The above systems have some problems, such as high cost, low universality, high intrusiveness, which can no longer meet the needs of gesture recognition applications to integrate into daily life, especially specific needs such as sign language gesture recognition. In recent years, the rapid development of wireless sensing technology and the ubiquitous WiFi equipment provides a new scheme to overcome these limitations. The Received Signal Strength (RSS) technology based on WiFi has been widely used in indoor positioning [
13,
14]. And, the RSS has made some progress in gesture recognition, the most representative of which is Wi-Gest [
15] is an application system that uses hand movements to cause changes in the WiFi device RSS for gesture recognition. However, because RSS belongs to the Media Access Control (MAC) layer signal, it has inherent defects such as instability, coarse-grained and vulnerable to environmental factors, so it is impossible to identify gestures with high precision. The emergence of CSI solves this problem. As a fine-grained and stable channel feature of the Physical layer, CSI is found on commercial WiFi network cards and presents the wireless channel characteristics of the subcarrier level in the way of Orthogonal Frequency Division Multiplexing (OFDM) demodulation. Compared with RSS, it has better time stability and finer channel frequency domain information. CSI has done a lot of research in high-precision indoor positioning [
16,
17], human motion recognition [
18,
19], human behavior detection [
20,
21]. At present, there have been some achievements in the research of gesture recognition based on CSI. WiG [
22] first uses the device-free WiFi network card and the channel characteristics of general routers to achieve fine-grained gesture recognition with high precision. Wi-Finger [
23] recognizes finger gestures of digits 1–9 by establishing user gestures and CSI signal changes caused by different gestures. WiAG [
24] uses changes in Channel Frequency Response (CFR) model parameters caused by human gestures to realize gesture recognition independent of position and direction. WiGeR [
25] extracts CSI in commercial devices, uses the Dynamic Time Warping (DTW) algorithm to associate specific hand movements, and realizes 7 gesture movement recognitions that can control smart homes. Wi-Sign [
26] uses three WiFi transceivers to associate gesture movements with CSI signal waveform changes and realizes the recognition of eight commonly used sign language gestures. WiMorse [
27] uses CSI to sense subtle finger movements and achieves 95% recognition accuracy for Morse codes generated by fingers.
In this paper, we propose a gesture recognition method using amplitude and phase difference information in CSI under 5.7 GHz wireless signals, named Wi-SL, 12 gestures (all commonly used sign language actions) are recognized by low-cost and widely used commercial WiFi devices.
Figure 1 shows the specific gestures selected in this paper. We make use of the different phase difference information produced by the motion of different parts of the hand, and divide the recognized gesture into two categories, namely Finger language gesture (the combined action of finger joint and hand) and Sign language gesture (the combined action of arm and hand), filter out the environmental interference in the frequency domain through the Butterworth low-pass filter, use wavelet function to smooth the CSI data to facilitate the extraction of gesture features, and use K-Means and SVM to form a low-complexity KSB classification model to achieve high-precision recognition of gestures.
The main contributions of this work are as follows:
We propose a CSI-based device recognition method for sign language actions, Wi-SL. Under the wireless signal of 5.7 GHz frequency band, the amplitude and phase difference characteristics of the sub-carrier level are correlated with sign language actions to realize intelligent, high-precision, contactless sign language action recognition.
We construct an efficient denoising method in the Wi-SL system and use a Butterworth low-pass filter combined with a wavelet function to effectively filter multipath components and environmental interference to ensure the accuracy of sign language recognition. In addition, we have designed a reasonable optimal carrier selection strategy that effectively reduces system computational overhead.
In the data classification and fingerprint matching stage of the Wi-SL system, an efficient KSB classification model is designed. KSB uses K-means to complete the sign language action data clustering and uses the integrated learning Bagging algorithm combined with the majority voting strategy to complete the selection of the optimal SVM classifier to achieve the efficient classification of sign language action feature data.
In three scenarios, the corresponding multipath effect is from strong to weak (laboratory, corridor, hall) to test the performance of Wi-SL. The experimental results show that the system is highly robust, with an average recognition rate of 95.8% in Line-Of-Sight (LOS) and 89.3% in Non-Line-Of-Sight (NLOS).
This article is organized as follows.
Section 2 introduces the CSI and the background knowledge of channel feature selection and gesture recognition;
Section 3 introduces the method design of Wi-SL, gives the overall process of the method, and elaborates the process in detail;
Section 4 introduces the experimental scenarios settings and important experimental parameter settings, analyzes the factors affecting the experimental effect, and evaluates the comprehensive performance of the system; Finally,
Section 5 summarizes the full text and provides an outlook for the future research work.