1. Introduction
The disabled are a special group in contemporary society. Physical defects have brought many inconveniences to their lives. In order to make up for the missing upper limbs of the handicapped and improve their self-care ability, upper limb prostheses are used to replace part of the functions of the lost limbs [
1,
2]. For the control of upper limb prostheses, predicting the user’s movement intention is as important as identifying the type of action of the upper limb. Action recognition usually focuses on the complete action performed by the upper limb. It is the result of doing the action, such as drinking water, putting on shoes, and brushing teeth [
3]. In contrast, intent prediction not only identifies the types of actions performed by users, but also focuses on how to perform these actions. It is the process from “what to do” to “how to do it” [
4]. Intention prediction is not only applied in the field of disability; it also plays an important role in the field of rehabilitation and healthcare [
5,
6,
7].
In the human body, information sources that can be used to control the upper limb prosthesis mainly include electrophysiological signals and mechanical signals [
8]. The IMU has become one of the main ways to measure mechanical signals. It can be used to measure the acceleration, angle, and other dynamic information of limbs. With the increasing perfection and popularization of wearable sensors, IMUs have developed widely in the field of action recognition and intent prediction [
9,
10]. Fuan et al. used inertial sensors to design a human action recognition system and achieved 95% accuracy [
11]. Tong et al. put inertial sensors on the hands of patients with Parkinson’s disease to capture the acceleration of the wrist. In addition, the neural network model is used to identify hand tremors to achieve symptom recognition [
12].
The use of the IMU to control assistive prosthetic hands has become a hot research topic. A key problem in controlling the assistive prosthetic hand is choosing an appropriate machine learning algorithm. In the face of large amount of data, machine learning algorithms can improve the efficiency of recognition, to a certain extent. After learning from a large amount of data, machine learning algorithms can identify and predict current activities based on new observation data. Liu et al. have designed a motion prediction system using the IMU. Based on the acceleration and angle data, a support vector machine is used to classify motion patterns with an average accuracy of 94.25% [
13]. Yeaser et al. has proposed a classification method for predicting rollator user intent using the data collected by the IMU, and the KNN classification algorithm achieved 92.9% accuracy [
14]. Although classifiers, such as SVM and KNN, can achieve a high action recognition rate, they do not have the function of memorizing long historical information, and are unsuccessful in experiments predicting human motion. Of course, there are some machine learning algorithms that can achieve better data predictions. Altan et al. have developed a new hybrid wind speed prediction model based on the LSTM network and the gray wolf optimizer decomposition method. The resulting model can capture the nonlinear characteristics of wind speed time series and has better predictive performance than a single prediction model, in terms of accuracy [
15]. In the financial market, in order to make very high-precision price predictions for digital currencies, they have developed a new hybrid prediction model based on the LSTM neural network, empirical wavelet transform decomposition, and the cuckoo search algorithm. This hybrid model can capture digital, nonlinear properties of monetary time series [
16].
Generally, in addition to the requirement of high accuracy for controlling the assistive prosthetic hand, the following requirements also need to be met to measure the quality of the control effect: (1) low prediction delay; and (2) the achievement of a smooth and continuous transition between different activities [
17]. The essence of human activity data is time series data. That is, subsequent data have a certain correlation with previous data [
18]. Therefore, after training the neural network with a large amount of data, based on the previous observation data, it is helpful to predict the change trend of the subsequent data, in order to achieve a smooth transition between different activities. In order to meet the requirement of low prediction delay, in the process of controlling the assistive prosthetic hand, it is necessary to predict the motion of the upper limb according to the observation data collected in real time. As time goes on, the volume of observational data is also increasing. Some machine learning algorithms need to analyze a complete observation sequence before making an action prediction, and the wide length of observation data will increase the running time of the algorithm, resulting in a high control delay. Examples of this are recurrent neural networks (RNN) and dynamic time warping algorithms. Although these algorithms can achieve high recognition accuracy, they are useless for solving low-latency problems [
19,
20].
In order to solve the problem of some algorithms not having the function to memorize long historical information, this paper designs a prediction model of action intention. With the memory function of the LSTM neural network, the LSTM is used to predict the motion data of the upper limbs, so as to reduce the control delay of the manipulator. Therefore, this paper aims to reduce the delay in controlling the assistive prosthetic hand, and a new method for predicting the action intention of the upper limb is proposed. This method can be used to predict the hand movement of the assistive prosthetic hand when the user completes a foot action. By predicting the angular velocity of the hand to judge the motion of the upper limb, the delay of controlling the prosthetic hand can be reduced, to some extent. This paper concerns healthy people performing necessary foot actions in daily life, including putting on shoes, putting on socks, and tying shoe laces. The IMU is used to collect the angular velocity data of the upper limb. Based on the motion data of the healthy people’s upper limbs, the correlation between palm movement and arm movement is analyzed while they perform foot movements. Based on the motion data of the arm, the LSTM network is used to predict the motion of the palm to achieve the goal of reducing the delay of controlling the manipulator. In addition, combined with the motion data of each part of the upper limb, the LSTM network is used to identify the motion state of the upper limb. Finally, based on the prediction results of the LSTM, the assistive prosthetic hand is controlled to reproduce the foot actions.
Section 2 introduces the process of data acquisition and the correlation analysis of the arm and the hand in detail.
Section 3 describes the extraction method of the feature parameters and the key methods in the long short-term memory neural network model.
Section 4 presents the experimental results in detail and discusses the findings. Finally, conclusions are given in
Section 5.
2. An Overview
Three sets of the IMU were used to acquire the angular velocity of the upper limb during movement. The IMU was installed on the shoulder, forearm, and palm.
Figure 1 shows the installation location of the sensors. The IMU collects data every 20 milliseconds, and the frequency is 50 Hz. The accuracy of the sensor is 0.2 degrees. The intact upper extremity of a healthy person usually includes the shoulder, forearm, and palm. The palm can be seen as an extension of the upper arm, and the movements of the fingers play an extremely important role in grasping objects.
The correlation between the movement of the arm and the palm of normal people was studied while they completed the foot action. The angular velocity data of the three parts of the upper limb was taken when performing the actions of putting on socks, putting on shoes, and tying shoe laces. The Pearson correlation coefficient was calculated between the arm data and the palm data, so as to measure the correlation between the arm and the palm.
Table 1 shows the Pearson correlation coefficient for the arm and the palm. It can be seen from the table that for different action types, the correlation coefficients between the arm and the palm are all greater than 0.6, indicating that the degree of correlation between the movement of the arm and the movement of the palm is strongly correlated. Therefore, this study uses a machine learning method to infer hand movements based on the movements of the arms.
In order to grasp objects steadily, a normal person will open or close the fingers when the upper limb is at rest. It can be seen that finding the time when the upper limb is at rest is the key to understanding the grasping intention of the upper limb and controlling the assistive prosthetic hand. When the user has the intention of grasping, the arm will move towards the target position under the driving of the limb. At this time, the switching of upper limb of motion state is a direct manifestation of grasping intention. During the movement of the upper limb towards the target position, the angular velocity of the upper limb will become larger and larger. Before approaching the target position, the value of the angular velocity will become smaller and smaller until it is close to 0. Through the kinematic analysis of the upper limb, the motion states of the upper limb can be divided into rest, acceleration, and deceleration. In the initial state, the motion state of the upper limb is at rest by default.
Figure 2 shows the switching flow of the upper limb state. The acceleration state and the deceleration state can be switched with each other, but the rest state and the acceleration state, or the rest state and the deceleration state, cannot be switched with each other.
The moment when the upper limb is at rest is used as a sign to control the assistive prosthetic hand. In previous research, the motion state of the upper limb according to the collected observation data was judged. When the motion state of the upper limb was at rest, the assistive prosthetic hand was controlled. Since it is controlled after the motion state of the upper limb is at rest, this will increase the delay of controlling the assistive prosthetic hand. Therefore, according to the current observation data, the user’s motion state is first identified, and then the subsequent motion state of the upper limb is predicted. When the predicted result is at rest, the assistive prosthetic hand is controlled.
3. Implementation of Key Model and Methods
Based on the above correlation analysis, this paper uses a long short-term memory neural network to predict the motion data of the palm. In this section, the extracted time domain features are introduced in detail, as well as the design process of parameters when using LMTN.
3.1. Feature Extraction
Due to bias drift, geomagnetic interference, and other causes, the original data collected by the inertial sensor is mixed with noise; as a result, this paper uses the moving average filtering method to filter the original data. Extracting feature parameters is one of the important ways to characterize sequence data. When analyzing data collected by inertial sensors, the features that are often extracted are divided into three categories: time-domain features, frequency-domain features, and time-frequency features [
21]. Considering that the sensor output is a set of time series data, this paper directly uses time domain features to analyze the angular velocity data of the upper limb. The characteristic parameters include the variance, difference, and maximum and minimum value of the angular velocity of each part of the upper limb.
Variance can be used to measure the dispersion of a set of data. When analyzing the angular velocity of the upper limb, the magnitude of the variance can represent the degree of fluctuation of a set of data, thus representing the range of movement of the upper limb. The equation for variance is shown in Equation (1).
Among them, represents the angular velocity of the i-th sampling point of a certain part of the upper limb, N represents the length of the signal window, and represents the average value of this set of data.
The magnitude of angular acceleration can characterize the direction of motion. Therefore, in a set of data, the difference between two adjacent angular velocity data is recorded as
. The magnitude of several differences can characterize the motion state of the upper limb. The equation for the difference is shown in Equation (2).
In a set of data, the magnitude of the difference between the maximum value and the minimum value can represent the magnitude of the fluctuation of a set of data. The larger the difference, the larger the fluctuation. Therefore, the maximum and minimum values of a set of data are obtained, respectively, and the difference is then recorded as
MN. The equation for
MN is shown in Equation (3).
3.2. Neural Network
The neural network model is an important component of machine learning. Human activity data are essentially a set of time series data. Recognizing human actions is actually classifying serialized data. In order to obtain better results of action recognition, it is necessary to analyze and judge the entire time series. The current action not only depends on the current data, but also has a relationship to the previous data. RNN can solve the problem of the traditional neural network model, which does not have the function of memorizing historical information. In RNN, the output result depends not only on the current input data, but also on the previous output, so the previous historical information can be fully utilized. However, some studies have shown that although RNN can handle the dependence of time series data, it is difficult to learn and preserve long-term historical information. The effect on processing long series data is not good [
22]. The LSTM neural network, as a special recurrent neural network, effectively solves this problem through a unique gate structure.
The LSTM neural network consists of an input layer, hidden layer, and output layer. Among them, the hidden layer with memory function is the core of the LSTM neural network.
Figure 3 shows the structure diagram of the hidden layer unit. The hidden layer unit includes an input gate, an output gate, and a forgetting gate, which are represented by
,
, and
:
Among them, is the input at the current moment, is the output of the previous unit in the hidden layer at the previous moment, is the input weight matrix, is the weight matrix between neurons in the hidden layer, and b is the bias term.
The steps in which the neural network processes the data are as follows:
- (1)
According to the output at the previous moment and the current input , the sigmoid function is used to calculate which information can pass through .
- (2)
Control the input of saving the previous information and the current information, and then calculate the ratio of the historical information and the current input through the tanh function calculation. The updated current unit state can be expressed as:
- (3)
The output of the sigmoid function does not consider the information
learned at the previous moment, and then the new unit state information is filtered and compressed through the tanh function to make the information more stable. Finally, take the inner product of
of the output gate and the new
to obtain the hidden layer state
at the current moment:
- (4)
When the input of the time series data is completed at the last moment, the hidden layer state
of the long short-term memory neural network model is used as the input to the output layer of the network model. Then use the softmax function to calculate the final predicted action probability
y:
3.3. Model Implementation Details
The action intention recognition of the upper limb includes two contents: (a) Identify the grasping intention of the upper limb, mainly to identify the motion state of the upper limb, and find the moment when the upper limb is stationary; (b) Predict the movement of the upper limb. It mainly predicts the angular velocity data of the subsequent time points according to the angular velocity of the upper limb. In the process of recognizing and predicting the movements of the upper limb, the LSTM neural network is used to analyze the angular velocity data of the upper limb. The flowchart of action intent recognition is shown in
Figure 4.
For identifying the grasping intention of the upper limb, based on the training set data of the upper limb movement, a feature extraction is performed first, which contains four sets of feature vectors. Therefore, a feature matrix with
N rows and four columns can be formed. The motion states of the upper limb are divided into rest, acceleration, and deceleration. The LSTM is used to learn the sample data, and the parameters of the neural network are set as shown in
Table 2. Then, the trained network model is used to identify the observation data to judge the motion state of the upper limb.
For predicting the movements of the upper limb, the LSTM neural network is also used to train the preprocessed sample data, and the weight matrix
W and the bias term b are obtained through continuous iterative learning.
Table 3 lists the LSTM network parameter names and corresponding parameter values. The specific steps are as follows:
- (1)
Initialization parameters. In the neural network model, the weight matrix and bias term need to be initialized first. The Gaussian distribution is used for the initialization of the weight matrix, where the mean of the Gaussian distribution is and the standard deviation , which is in line with the general weight distribution, and the initial bias is set to 0.
- (2)
Calculate the error between the actual value and the predicted value. The LSTM neural network is used to predict the observed value, and the predicted value
y of the output layer of the network is obtained through a series of formula calculations. Calculate the cross-entropy of the predicted value y and the true value
as the error. The loss is as follows:
- (3)
Determine the weight matrix and bias term. Calculate the gradient of the loss function loss to the weight matrix W and the bias term b, and backpropagate the obtained gradient to the front end of the network to adjust the parameters of each part of the network. Iteratively train the reduced loss function through momentum stochastic gradient descent until convergence is reached.
The paper uses MATLAB software to compile the code program of the LSTM neural network. MATLAB software has a neural network toolbox. The TrainNetwork function is used to train the parameters of the LSTM. The LSTM network model is implemented using the MATLAB toolbox. Based on the visual studio 2019 platform, the host computer software for data acquisition and analysis was developed. Training and testing are run on a PC with Intel Core i7-6500U CPU, 12 Gb DDR-III 2400 MHz RAM, and NVIDIA GeForce 940MX.
5. Conclusions
In order to reduce the delay in controlling the assistive prosthetic hand, this paper proposes a new method for predicting the action intention of the upper limb. Based on the correlation between the angular velocity of the arm and the angular velocity of the palm when a normal person completes a foot action, the LSTM is used to predict the angular velocity of the hand. The motion information of the upper limb is collected by the IMU, including the angular velocity of the shoulder, forearm, and palm. Whether the upper limb is still or not is used as a sign to control the assistive prosthetic hand. The motion states of the upper limbs are divided into acceleration, deceleration, and rest. The LSTM neural network is used to identify the motion state of the upper limb, by up to 99%. In the action prediction of the upper limb, the LSTM is used to predict the angular velocity data of the palm, and the error between the actual value and the predicted value is calculated by the root mean square error. For the actions of putting on shoes and putting on socks, the root mean square error is less than 1.5 rad/s; for the action of tying shoelaces, the root mean square error is less than 4 rad/s. Finally, the neural network model is applied to the experiment of controlling the assistive prosthetic hand. The delay in controlling the assistive prosthetic hand is recorded and compared with the delay produced by our previous control method, with an average delay time of 0.65 s. Based on the analysis of the experimental data, it can be concluded that the LSTM neural network can achieve low prediction error.
Taking the motion information of the upper limb of the human body as the information source for judging the intention of the upper limb can avoid the influence of factors, such as age, gender, and degree of amputation on the control of the prosthesis. The experimental results of this paper are of great help to research on the control method of assistive prosthetic hands. The research in this paper is limited to the design of the method, and there is still a long way to go for generalization to the applications needed by people with disabilities. In the future research, it is necessary to focus on the design of the upper limb prosthetic system, combine the control method and hardware device more effectively, and apply results to services for the disabled.