1. Introduction
The economic consequences of an aging population published by the United Nations in 1956 established the population classification standard. The number of people aged 65 and above in a country or region is more than 7% of the total population [
1]. Japan is a country with serious aging, and its population has been decreasing for nine consecutive years. According to the statistics in 2019, the population over 65 years old in Japan is 35.885 million (28.4%) [
2]. According to the data released by China’s National Bureau of Statistics, by the end of 2019, the number of older adults aged 60 and above will reach 254 million (18.1%), and the number of older people aged 65 and above will reach 176 million (12.6%) [
3].
Therefore, in response to the increasingly serious aging trend, people have paid more attention to monitoring older adults living alone. These elderly people usually take dangerous actions (such as a sudden fall) that damage their health. The existing methods can be divided into wearable and non-contact. The common types of wearable sensors are bracelet and ankle monitor [
4], which needs to be worn close to the body, and there are some problems, such as low universality, poor portability, high damage rate, single function, and high false alarm rate. As a traditional non-contact sensor, the camera has high storage space requirements and information processing ability and is sensitive to light and other conditions. Both camera and radar observation methods are developed based on non-wearable devices. Compared with cameras, UWB radar is not restricted by blocking objects. It can still play a monitoring role outside the visible range of the camera. In addition, the radar is not affected by light conditions and can still work in dark environments. This feature can monitor the nocturnal activities of the elderly [
5]. Therefore, we propose a non-contact monitoring device based on UWB radar, which can effectively solve dim light and blind area problems.
UWB radar uses the Doppler effect to recognize the human posture by the mutual movement between the human self and the radar. Ref. [
6] used continuous wave radar to extract Doppler features and a trained SVM to recognize seven rhythmic human movements. The accuracy of classification results is around 92.8%. Ref. [
7] applied the Doppler radar to evaluate the Doppler characteristics of human walking, which shows that this method effectively recognizes the moving human body’s falling posture. In Ref. [
8], the Dynamic Range Doppler Trajectory (DRDT) of human motion was extracted using frequency modulated continuous wave radar. Simultaneously, the DRDT was combined with the machine learning classification method to separate continuous motion into a separate motion, and it shows that the average classification accuracy is above 91.9%. In Ref. [
9], a depth convolution neural network (DCNN) was applied to human activity classification based on micro-Doppler features, and it was transformed into an image classification problem. Ref. [
10] used Doppler radar combined with the DCNN method and transfer learning method to carry out five kinds of human posture classification and recognition. Ref. [
11] combined the micro-Doppler features with the DCNN method to recognize human hand posture. Based on the above analysis, human posture recognition based on radar can realize the classification and recognition of highly similar actions in a simple framework with high feasibility. Because of its high resolution, intense penetration, and low power consumption, UWB radar can effectively overcome the lighting and privacy problems of the camera, ensure all-weather work, and effectively solve the problem that the optical system does not work well due to occlusion. Thus, it acts as an essential role in the monitoring of the elderly.
We propose the PCA-LSTM algorithm. Principal Component Analysis (PCA) is a data analysis method. This algorithm transforms the original data into a set of linearly independent representations of each dimension through linear transformation, which can be used to extract the main feature components of the data. The PCA algorithm can reduce the dimensionality of the original complex data. As a result, it can greatly reduce the complexity of signal processing. Compared with other algorithms, PCA has the advantages of simplicity and no parameter restrictions. It can be used to extract the characteristics of the main information components of the signal, which can be used to filter out the noise contained in the signal. At the same time, it can also be used to combine signal characteristics. In the classification and recognition of human poses, similar poses will have some feature vectors with greater correlation. In this case, the PCA algorithm can merge the highly correlated features into one feature. When the information represented by the two features is the same, the algorithm can be used to filter out one of the redundant features. In addition, when there are many signal features and overfitting occurs, the PCA algorithm can preserve the really useful part of the signal. In order to prevent the features of similar actions from being too similar to make the feature vectors all integrated, the PCA algorithm is combined with Long Short-Term Memory (LSTM). The purpose is to solve the problem of gradient disappearance during long sequence training. The main function of LSTM is to control the transmission state through the gating state. LSTM can remember the information that needs to be remembered for a long time and forget the unimportant information. Although the Recurrent Neural Network (RNN) algorithm can also handle sequence-changing data, it only has a single memory stacking method compared with the LSTM algorithm, and it is not suitable for long-sequence memory tasks. When monitoring the elderly indoors, the PCA-LSTM algorithm can memorize the slow movements of the monitored person for a long time and accurately classify them. In addition, unimportant normal actions are merged and forgotten. Compared with other algorithms, the PCA-LSTM algorithm has few data processing and improved accuracy.
The main contributes of this paper are twofold. Firstly, an improved algorithm based on PCA-LSTM is proposed to integrate micro-Doppler features and time-sequence to recognize human posture. Secondly, the accuracy of different kernel functions in human posture classification is studied.
The rest of the paper is organized as follows.
Section 2 discusses some closely related work.
Section 3 presents the system model and formally states the considered problem.
Section 4 presents the proposed algorithm with theoretical analysis, and
Section 5 demonstrates the performance of this algorithm via simulations. Finally,
Section 6 briefly concludes the paper.
2. Related Work
The human posture recognition system based on a UWB radar should include four essential parts: human echo data acquisition, echo data processing, feature extraction, and recognition algorithm (see
Figure 1). Among them, echo data processing and feature extraction methods can be divided into three categories. The first method is based on traditional statistics, including statistical characteristics such as mean and variance of radar echo envelope. The second one is based on time-frequency transform, including Fourier transform and the short-time Fourier Transform (STFT), pseudo-Wigner distribution (PWD), and wavelet transform. The third one is based on component analysis, and it mainly includes the Principal Component Analysis (PCA) and the Independent Component Analysis (ICA).
Combined with the UWB radar’s characteristics, according to the different feature extraction methods, the human posture recognition method is divided into traditional machine learning and deep learning methods. Firstly, the traditional machine learning methods are mainly used in human posture recognition, such as SVM, k-Nearest Neighbour (kNN), and Random Forest (RF). These classification and recognition methods are mainly based on the statistical theory and shallow features extracted from the original echo data, such as mean, variance, Euclidean distance, Fourier transform [
12]. The feature extraction methods, such as PCA, Discrete Cosine Transform (DCT), and the prominent shallow features are selected for recognition. Secondly, the method based on deep learning can effectively avoid the tedious manual feature extraction and selection in the traditional machine learning methods for UWB radar human posture recognition and improve performance. Compared with the traditional machine learning method, it can automatically learn the effective feature extraction mechanism through the data without manually designing the feature extraction process. In addition, it can also identify complex human activities while reducing the processing process. Ref. [
13] proposed a classification and recognition algorithm based on a convolution neural network (CNN). The field of UWB radar human body recognition has attracted many scholars’ attention, and many derivative network models have been produced. Ref. [
11] used UWB radar to collect micro-Doppler features of human gesture and then combined with an improved DCNN to recognize ten kinds of gesture. In the training phase, the 5-fold cross-validation method is used to improve the model’s generalization ability, and the average recognition rate is more than 90%, but it is still sensitive to the change of distance and azimuth. In order to solve this problem, [
13,
14] extracted three robust features from the pre-processed gesture echo signal and defines a fitting data algorithm, which detects the periodicity of gesture motion to eliminate the unexpected motion of hand or body, and achieves 95% accuracy.
For radar signal processing, researchers have proposed using time-frequency analysis to obtain the information in the signal. Fourier transform and its improvement methods are the main methods of time-frequency analysis. Ref. [
15] reported the efficiency of the Fourier–Bessel transform and time-frequency-based method in conjunction with the fractional Fourier transform. The short-time Fourier transform processes the signal by selecting different window functions. This method can determine the frequency and phase of the sine wave in the time-varying signal’s local area [
16,
17,
18]. Wavelet transform (WT) is also used in radar signal processing. Compared with the Fourier transform, this method’s transformed base is a wavelet base with limited length and attenuation [
19,
20,
21]. Ref. [
22] used WT to detect a person’s fall through a radar installed on the ceiling. The WT method uses wavelet decomposition coefficients of a given scale to determine the time location where falls may occur. The short-time Fourier transform (STFT) decomposes the entire time domain process into countless small processes of equal length, and each small process is approximately stable. After that, Fourier transform is performed on each small process. Compared with STFT, the wavelet transform replaces the infinitely long trigonometric function base with a finitely long attenuating wavelet base. This wavelet base can be translated and stretched so that the signal can be analyzed at different times and different frequency ranges. However, wavelet transform is not applicable in this situation. If wavelet transform is used to process the collected data, the choice of wavelet base and the choice of scale function are different in each action. In addition, after processing the echo data by using different wavelet bases and scale functions, the comparability of the obtained attitude features will be reduced. In contrast, since the posture of the elderly changes slowly, the signal in each window can be regarded as a relatively stable signal when using STFT. Furthermore, the feature vectors obtained by STFT are more comparable.
The classification method is used to classify different poses after obtaining the time-frequency information and micro-Doppler features of different human body movements. Researchers have developed a variety of classification methods for high-dimensional data. The KNN method is one of them. Ref. [
23] used the KNN algorithm to classify poses based on the ratio and difference of the human contour bounding box. Hämäläinen M. et al. combined the KNN algorithm with UWB radar to detect the posture of the human body without camera surveillance. Furthermore, they also analyzed the reliability and fault tolerance of the UWB radar network framework. The experimental results show that the accuracy of the system can reach 99% for the static posture of the human body [
24]. Besides, the SVM is also widely used in classification scenarios. Ref. [
25] used SVM to classify heel acceleration and plantar pressure data to determine whether the human body is in a sedentary posture. Mizumoto T. et al. [
26] combined the RF algorithm and Kinect sensor with a microbehavior sensing system. This system can identify microbehaviors by extracting features from data. The accuracy rate can reach 78%.
Currently, the comprehensive systems that integrates data collection, hardware improvements, and classification methods to classify human posture has attracted the attention of many researchers. Sizhe An et al. [
27] developed an intelligent medical assistance system based on millimeter waves. The main purpose of the system is to allow those with movement disorders to recover from sports. In rehabilitation treatment, patients are required to perform standard actions to achieve the purpose of rehabilitation. The judgment of whether the movement is standard or not requires very precise joint positioning to be realized. However, the high-precision action judgment increases the complexity and manufacturing cost of this system. Furthermore, the system used in this article needs to be additionally equipped with Nvidia Jetson Xavier-NX to work. This is inconsistent with the purpose of monitoring the elderly. Sengupta A et al. [
28] tracked human bones. The authors used the reflected signals of millimeter wave radar to detect 15 different bone joints. The main actions proposed are focused on the changes in the bones of the upper limbs of the human body, such as the swinging of the arms, and the CNN algorithm is used to structure the image according to the information of the radar echo. However, this article focuses on the change of the overall posture of the human body. When monitoring the elderly, the overall posture of the human body is not judged by the changes of several joints. Xue H et al. [
29] used millimeter-wave radar and motion capture technology to perform 3D real-time modeling of the human body’s active posture. The posture of the human body obtained in this article is very delicate. In addition, it can be presented on the computer in real time through a motion capture system. However, the VICON motion capture system used in the article requires huge space and requires the assistance of a camera, so it is not suitable for home use.
3. System Model and Problem Statement
Most human pose recognition and classification methods use cameras and optical sensors to collect data to determine the real-time human poses. However, dark light and long acquisition distance lead to inaccurate fusion data. Radar has the advantages of fewer restrictions and high accuracy in acquiring human posture data. As a result, there is no need to attach other equipment to the human body to collect real-time posture data. In addition, cameras collect data in the form of images or videos. Compared with radar, cameras are more likely to leak the privacy of the monitored person. Radar collects echo data directly, so privacy and confidentiality are better. We propose a human posture recognition method based on UWB radar, and the structure of the proposed system is shown in
Figure 2:
1. UWB radar is responsible for collecting human posture data, including: from standing to sitting, from sitting to standing, walking in place, falling, periodic boxing. The dynamic motion of the human body relative to radar produces the Doppler effect. The echo data collected by radar is converted into time-frequency data by STFT, which reflects the movement characteristics of the human body.
2. Six motion features of the human body are extracted according to the Doppler shift and micro-Doppler features of the human body, and these six features are saved as vector form. The system classifies the human pose and studies the classification and selection of different kernel functions in SVM. As a result, this system can obtain the appropriate kernel function for indoor human posture recognition.
3. The collected data and data sets are trained to evaluate the accuracy of recognition and classification.
When the signal is transmitted to the target, a part of the energy of the signal will be reflected by the scatterer constituting the target and be observed by radar. Joseph Keller [
30] believes that when the operating frequency of the radar is high enough, the scattering behavior of complex targets can be succinctly modeled as the sum of the scattering responses based on a simple scattering mechanism. Therefore, the physical correlation model of the measured radar echo
can be constructed [
31].
where
is the order of the model;
is the scattering behavior model of type
, which is a function of the sampling frequency
and the number of pulses k and has a parameter
. During the pulse k, the distance of the m-th scatterer is
, c is the speed of light, and
is the additional white Gaussian noise. The observation result is the accumulation of
pulses and
radar echo signals with frequencies
. When the object characteristics of the target are much smaller than the resolution of the radar, it is sufficient to limit the above equation to the sum of point type scatterers. Without considering the inaccuracy and unambiguously estimated free variables, the scattering behavior model can be simplified as
. Thus,
where
is the scattering coefficient of the m-th feature on pulse k. Therefore, the human body can be regarded as many point scatterers propagating in space. Through the STFT of the observed radar echo signal, the distance can be directly connected with the Doppler effect.
When the human body moves, the UWB radar receives the echo signal, which contains the motion information of the human body, in which the Doppler shift reflects the speed of human movement. Based on the time-frequency analysis of the echo signal of human motion, the micro-Doppler characteristic data are obtained. The normalized energy, variance, skewness, and kurtosis of the radar echo of human motion are used to distinguish human motion behaviors: from sitting to standing and from standing to sitting. We mainly use the PCA method to reduce dimensions according to the mean value, variance, and speed of PCA coefficients and recognize human actions: walking, falling, sitting, and boxing. The following six micro-Doppler features is used to express human motion posture, including: Torso Doppler Frequency, Total Bandwidth (BW) of the Doppler Signal, Total Doppler Signal Offset, Torso Doppler Frequency Offset, BW of the Doppler Signal Without Offset and Standard Deviation (STD) of Normalized Doppler Signal Strength, in which Torso Doppler Frequency is the average frequency corresponding to the maximum energy of each frame in the radar echo signal and Total BW of Doppler Signal represents the movement of the human body. The highest and lowest frequencies of each time window constitute the high-frequency and low-frequency envelopes. It is the average difference between the low-frequency envelope’s minimum frequency and the high-frequency envelope’s maximum frequency. Total Doppler Signal Offset is the mean of the minimum frequency of the low-frequency envelope and the maximum frequency of the high-frequency envelope. Torso Doppler Frequency Offset is the average of the maximum and minimum trunk frequency. Bandwidth of Doppler Signal Without Offset is the average difference between the maximum frequency of the low-frequency envelope and the minimum frequency of the high-frequency envelope, which can describe the up and downswing of the human torso. Standard Deviation (STD) normalized Doppler Signal Strength represents the interaction between human motion and micro-Doppler radar.
Because of human motion signals’ non-stationary, non-linear and instantaneous characteristics, it is not easy to obtain the parameters that can effectively describe and distinguish the complex and changeable motion signals by simple time-frequency transformation. Therefore, the second feature is mainly extracted from the time spectrum through the time-frequency analysis of the radar echo signal of human motion. For example, the spectral features of STFT are used for motion state recognition.
In general, there are many kinds of motion signal eigenvalues. Different extraction methods and complexity will lead to different operation times. In addition, the classification accuracy changes with the feature selection. Therefore, in classifying human motion state recognition, it is of great significance to select the appropriate feature value according to the motion signal’s characteristics to improve the accuracy of recognition and classification.
Problem Statement: Aiming at the problems of low recognition rate when using time-frequency analysis in human pose recognition, the problems considered are as follows: (1) for the PCA method, the feature extraction needs to be improved; (2) for the SVM classification model, the kernel function needs to be selected suitably for human pose recognition.
6. Conclusions
In this experiment, the PCA-LSTM algorithm combined with the SVM classification method is used to classify and recognize the human posture. In order to prevent the elderly from physical damage during data collection, two healthy males and one female (22–26 years old) were selected in this experiment to perform five postures while imitating the slowness of the elderly. In the experiment, it was found that the movement speed of the measured object and the degree of similarity of the movement are the limitations of this experiment. In the data collection for similar postures from standing to sitting and from sitting to standing, the person under test will slow down the movement and increase the movement range of the body. The main bone joints on the leg are all driven and move at a very slow speed. However, this group of similar postures can still be distinguished according to the direction of movement of the torso and this result also proves that this system still has a certain degree of robustness under the disturbance of similar postures.
The most influential factor in this experiment is except for the action of the tested object. The position of the radar also has a greater impact on the data of this experiment. The radar is placed directly in front of the measured object and the height is at the position of the chest and abdomen of the human body. This height allows the UWB radar used this time to collect the information of the measured object’s movement to the greatest extent within its radiation angle range. Although the data collected when the angle of view deviates from the front of the radar are very similar to the results obtained from the data directly in front, the change of the angle causes the feature vector to change and needs to be retrained.
In future research, we will focus our research on the detection and classification of posture when the action changes slowly. As the elderly move slowly indoors, they will slowly fall to the ground and sit down when they have symptoms. At present, the detection of this situation can only be observed by the camera and cannot be determined by the radar. Research in this area will be the focus of our future research.