1. Introduction
Gait is a biological characteristic that describes the manner in which people walk [
1]. Walking is one of the important activities that maintains our daily life [
2] and physical health. Surface electromyography (sEMG) is a weak bioelectric signal that characterizes to some extent the functional state between the human nervous system and muscles [
3]. By analyzing the characteristics of surface electromyography signals obtained from the lower limbs of humans, we can identify the gait phase of the gait cycle [
4]. Gait classification based on sEMG signals has been widely used in the diagnosis of muscle diseases and as a guidance path for rehabilitation medicine [
5].
Gait information includes video image, electromyography, three-dimensional and kinematics [
6,
7,
8], etc. The 3D motion capture is an accurate optical motion capture system, which can collect and record the 3D gait of the human body in real time and conduct quantitative analysis on gait indicators such as time distance parameters and kinematic parameters. It is commonly used in motion capture and analysis of high frequency and high-precision motion [
9,
10,
11]. The sEMG signal can reflect the activation degree of skeletal muscle and is highly correlated with muscle force [
12]. Therefore, sEMG has been widely utilized in the field of gait analysis [
13,
14]. The gait changes caused by diseases have also attracted extensive attention, accompanied by neuromuscular changes [
15,
16,
17]. With the development of the real-time monitoring system, much research has been conducted to distinguish gait differences between patients and healthy people, and gait indicators can now be effectively evaluated [
18,
19,
20].
The acquired sEMG signals require preprocessing, such as noise elimination and feature extraction before they can be used for performing classification. The process of feature extraction directly affects the final classification performance of gait classification. Depending on the differences between the extracted features, they can be classified into time domain (TD), frequency domain (FD), time-frequency domain, and nonlinear features [
21]. The TD features are extracted directly from the original sEMG time series signals without applying any transformation. As a result, the process for extracting these features is easy to implement and has low computational requirements [
22]. However, as sEMG signals are susceptible to interference caused due to physical fatigue and other factors, the TD features tend to suffer from severe abrupt changes and poor stability [
23]. The FD feature denotes the Fourier transform of the signal. It accurately characterizes the spectrum information of the signal. It is now customary to transform the signal from the time domain to the frequency domain for performing signal analysis [
24]. However, the TD and FD features have poor effects on some data types. Hu et al. [
25] observed that the traditional time or frequency domain analysis methods are unable to meet the requirements of mechanical faults and several dimensionless coefficients in high dimensional feature sets that reduce the accuracy and the fault identification speed of the diagnostic system. In order to address this phenomenon, Phinyomark et al. [
23] used the TD and FD features to classify the upper limb movements by using the recorded EMG data and observed that the combination of these features improved the classification performance as compared to using single domain features. Sejdic et al. [
26] used gait accelerometers to extract gait features of the elderly in time domain, frequency domain, and time-frequency domain respectively. The results showed that different feature sets could better distinguish between healthy people and patients with Parkinson’s disease and extract more differences in features between different groups.
Recently, the requirements for the classification accuracy of sEMG signals have increased [
27,
28]. Common myoelectric signal classification methods include the support vector machine (SVM) [
29], linear discriminant analysis (LDA) [
30], and the extreme learning machine (ELM) [
31,
32,
33,
34,
35]. Vikas et al. [
36] used SVM with LDA for extracting the TD features from sEMG to build a gesture classification model and combined it with optimization algorithms, such as particle swarm (PSO) and ant colony (ACO) to improve accuracy. Zhao et al. [
37] combined ELM with gas chromatography-mass spectrometry (GC-MS) to diagnose paraquat poisoning, and compared it with six methods. The authors observed that ELM effectively distinguished the poisoned patients. Although a variety of methods that combine optimization algorithms with classification algorithms for improving the classification have been presented in the literature [
13], research also shows that the traditional SVM easily falls into the local optima, resulting in poor classification results. In addition, ELM is prone to over fitting. Moreover, the traditional algorithms rely on the extraction of TD and FD features [
38], and the instability of TD features often leads to a decrease in the final classification accuracy.
In gait classification, the differences between the stride length, walking speed, and the fatigue of the lower limb muscles [
14] can lead to significant differences in the distribution of single features extracted from the time or frequency domains [
39,
40]. The deep belief network (DBN), i.e., a typical representative deep learning architecture can discover the distributed features based on the low-dimensional features for constructing a more abstract high-dimensional representation [
41]. This model learns layer by layer based on the low-dimensional signals by using greedy learning and automatically obtains the high-dimensional features. This not only enables us to avoid the complexity and uncertainty caused due to the traditional feature engineering, but also improves the generalization ability of the algorithm [
42,
43]. Qiu et al. [
44] used DBN to forecast the intrinsic modular functions in electricity load demand and to model each function to predict its trend. The final forecasts were derived from a combination of unbiased and weighted summation. Mohammad et al. [
45] used DBN to extract the depth features from the fusion observation of signals for classifying five basic emotions. As compared with traditional SVM, DBN significantly improved the accuracy of emotion classification and increased the nonlinear classification of emotions. Qiao et al. [
46] combined cognitive computing, DBN, and collaborative robots for building a model. The experiment shows that DBN significantly reduced the error rate by using its own neuron number, network structure, and training epochs and laid a foundation for the performance improvement of collaborative robots for the future. However, the self-parameters of these DBNs which are often determined by human experience, not only induce human diagnostic errors, but also affect the structure of the network. This leads to high computational cost and slow training speed of the whole model [
47]. Deng et al. [
48] proposed a differential evolution algorithm based on quantum computation to optimize DBN and applied it to the practical engineering problems. The results show that this algorithm has better optimization performance and classification accuracy as compared to non-optimized DBN. Xu et al. [
49] proposed the sparrow search algorithm (SSA), which improves the convergence speed, stability, and convergence accuracy of the model. Li et al. [
50] used simulated annealing (SA), PSO, and SSA to develop an improved DBN model by selecting the best model parameters. The results show that SSA-DBN achieves the highest assessment accuracy and is suitable for optimizing the network structure of DBN.
In this study, the TD and FD features are extracted from the sEMG signals, and their fusion features are used as the input of a DBN model for performing gait classification. The SSA with better optimization performance is used to adjust the network architecture of DBN and solve the problem of the empirical selection of DBN parameters.
The major contributions of this work are as follows:
- (1)
The layer-by-layer learning feature of DBN can solve the distribution differences of feature sets caused by gait differences.
- (2)
To solve the problem of empirical selection of DBN parameters, SSA with good optimization performance is used to prevent the model from falling into local optimization due to traditional low dimensional features in gait analysis.
- (3)
The proposed method effectively improves the accuracy of gait classification.
The rest of the manuscript is organized as follows.
Section 2 describes the proposed methods.
Section 3 presents the experimental results and discussion.
Section 4 concludes this work and presents the future work.
2. Materials and Methods
This experimental protocol is comprised of five parts, namely acquisition of experimental data and its pre-processing, feature extraction from sEMG signals, construction of the deep belief network (DBN), parameter optimization of SSA, and gait classification results. The flowchart of the proposed method is shown in
Figure 1.
2.1. Lower Limb Muscle Selection and sEMG Signals Processing
2.1.1. Lower Limb Muscles and Gait Division
Considering the role and contribution of lower limb muscles during different phases of walking, and the sensitivity of the sEMG signal acquisition device to lower limb muscles, the muscles with distinct performance characteristics are selected as the signal sources [
51]. As presented in
Figure 2, it includes tensor fascia lata (TF), adductor longus (AL), rectus femoris (RF), vastus medialis (VM), tibialis anterior (TA), semitendinosus (ST), gastrocnemius (GM), and soleus (SO).
A complete gait cycle can be divided into stance and swing phases [
52]. The stance phase can be further divided into pre-stance, mid-stance, and terminal-stance. The swing phase can be divided into pre-swing and terminal-swing, as presented in
Figure 3.
2.1.2. Signal Processing and Analysis
The surface electromyography (sEMG) signal is a complex, weak, and non-smooth electrical signal, which comprises motion artifacts caused by electrode offset and other noise interference induced during the acquisition process. Therefore, it is necessary to remove the noise efficiently. The denoising methods we adopted in the experiment include wavelet threshold denoising, wavelet packet threshold denoising, and wavelet modulus maximum denoising [
53].
2.2. Feature Extraction of sEMG Signals
After de-noising, the TD and FD features of each channel of the EMG signal are extracted. In this work, three representative time domain characteristics, including absolute mean value (MAV), variance (VAR), and zero crossing points (ZC) are used as frequency domain features [
54,
55].
MAV takes advantage of the property that sEMG signals have large amplitude fluctuations in the time domain, which are linearly related to the level of muscle activation. The higher the value of MAV, the higher is the activation level of the muscle.
where,
denotes the sEMG time series with a window length of
N.
VAR is a measure of signal power of the sEMG signal and is expressed as follows:
ZC refers to the number of times that the sEMG waveform passes through the zero point to avoid signal cross counting caused by low-level noise. It is mathematically expressed as follows:
where,
.
We select two representative frequency domain characteristics, namely average power frequency
and median frequency
[
56] defined as follows:
where,
is the power spectral density of the sEMG signal and
is the frequency.
Each feature is extracted by setting different window lengths N and to form a set of feature vectors. Then, a set of feature matrices is formed based on different kinds of selected lower limb muscles, where the number of rows in the matrix represents the number of selected lower limb muscle blocks and the number of columns represents the values of the windows in which the signal is divided. This feature matrix is used as the input data of the network in the next section.
2.3. Deep Belief Network
The deep belief network (DBN) is a probabilistic generation model that is designed by stacking multiple restricted Boltzmann machines (RBMs). Its training process is divided into two parts, i.e., the greedy unsupervised hierarchical pre-training process and the discriminative supervised fine-tuning process. Please note that the neurons in the same layer are not connected to each other and connections are only formed between adjacent layers [
57].
The basic building module of DBN is RBM. One RBM is composed of one visible layer and one hidden layer. During the training process of DBN, each RBM is usually pretrained from bottom to top in a layered manner, and the hidden layer of the previous RBM is used as the visible layer of the next RBM. Afterwards, the whole DBN model is fine-tuned based on the BP network set in the last layer. Finally, the output layer performs hypothesis prediction according to the posterior probability distribution obtained in the previous layer.
The basic network structure of the DBN model is shown in
Figure 4. In this work, we define the learning rate factor controlling the weight update rate as
and the number of fine-tunings as
.
Figure 4a–c represents the structure of the DBN model containing 1 RBM, 2 RBMs, and n RBMs, respectively. The first RBM is composed of the feature matrix data obtained in the previous section and the first hidden layer
. The parameters of the first RBM are trainable, and include the weights and offset coefficients of
. Then,
is treated as the visible vector and
as the hidden vector, and the second RBM is trained. The third RBM is trained in a similar fashion. The black circles in
Figure 4 represent the neurons of each layer. The number of neurons is usually determined manually. In this work, the number of neurons is set as Best_pos (
) (where
represents the
-th hidden layer and
).
The architecture of DBN possesses the ability to obtain higher dimensional features based on the layer-by-layer learning feature of this model. The hidden variables in each layer learn how to represent the high-order correlations of the original input data. In order to use DBN for classification, the feature vectors of the data samples are used to set the state of the visible variables in the bottom layer of DBN. This is followed by DBN generating a probability distribution of the possible labels of the data based on the posterior probability distribution of the data samples.
Let us assume that the dataset
contains
data sample pairs, where
is the
-th data sample and
is the corresponding
-th target tag. Given a data sample
(
) from the data set, the DBN with
hidden layers is represented as a complex feature mapping function. After feature conversion, the softmax layer is used as the output layer of the DBN to classify and predict the parameter
. If there are
neurons in the softmax layer, then the
-th (
) neuron is responsible for predicting the probability of the
-th class. The input of a given
is the output of the previous layer and is associated with the weight
and the offset
. The probability obtained by the softmax layer is mathematically expressed as follows:
where,
denotes the output of the previous layer. Based on probability estimation, the trained DBN classifier provides the following prediction.
The DBN is optimized by the statistical gradient descent with negative log-likelihood loss relative to the training set
. The posterior of each layer is approximated by the factorial distribution of independent variables within a layer. The values of the independent variables are provided by the variables in the previous layer. The purpose of the wake-sleep algorithm [
57] is to learn the characteristics of the original data and recover it correctly. It obtains the weights of the top-level undirected connections by fitting RBM on the posterior distribution of the penultimate layer. The fine-tuning process starts with the state of the top output layer and in turn activates each bottom layer by using a top-down connection. Thus, a DBN model can be considered as RBMs consisting of all prior hidden variables placed at the top layer of a directed belief network, combined with a set of “identified” weights to perform fast approximate inference.
2.4. Sparrow Search Algorithm
The sparrow search algorithm (SSA) is a metaheuristic algorithm that is inspired by the characteristics of birds, i.e., foraging and anti-predatory behavior [
49].
Let us suppose that a population of
sparrows conducts a search for food.
where,
denotes the dimension of the problem variable to be optimized and
represents the number of sparrows, and
,
. At this point, the fitness value is expressed as follows:
where,
denotes the fitness value.
The sparrows with high fitness value have a larger foraging search range as discoverers as compared to the joiners in the population. Therefore, the location update of the discoverers during each iteration is described as follows:
where,
is the current iteration,
is the maximum number of iterations,
is a uniformly distributed random number in range
.
and
denote the warning value and the safety value, respectively.
is a random number subject to normal distribution, and
is a matrix of dimension
. When
, there is no danger around the population and the discoverer can expand the search range to make the fitness value of other individuals higher. On the other hand, when
, a predator is detected around the population and an alarm is released. As a result, all the sparrows quickly fly to other safe places for feeding.
The update of the joiner’s position during each iteration is described as follows,
where,
and
denote the worst global position and the best local position of the joiner in the
t-th and (
t+1)-th iterations, respectively.
is a multidimensional matrix with internal elements of 1 or −1, and
. When
, the
-th joiner with lower adaptation has no gain in foraging and should shift its location to obtain higher energy.
The update regarding the position of the population after it becomes aware of the danger is described as follows:
where,
is the global optimal position of the current population,
is the step control parameter, which is a random number distributed normally with mean 0 and variance 1, and
is a very small constant used to avoid zero in the denominator.
is a random number,
is the fitness value of individual
,
, and
are the optimal and the worst fitness values of the current population, respectively. When
, it means that the current individual is at the edge of the population and is highly vulnerable to the predators. When
, the current individual is in the middle of the population. When it feels the danger, it should move closer to other sparrows to reduce the risk of being predated.
In this work, the SSA is used to search for the sparrow with the best position among the parameters to be optimized in the DBN, i.e., the sparrow with the highest adaptation degree. The parameters include the number of neurons Best_pos () per layer, the number of reverse fine-tunings , and the learning rate mentioned in the previous section. The optimal network structure of the DBN is set based on the parameters of this sparrow at the end of each iteration.
2.5. Training Process of Gait Results
The detailed steps of the proposed algorithm are presented below.
Step 1. We obtain the original sEMG signals dataset.
Step 2. We denoise the original signal dataset by using the wavelet modulus maximum method.
Step 3. The TD and FD features are extracted by using overlapping windows.
Step 4. The dataset is divided into training and test sets.
Step 5. We set the relevant parameters in the DBN model, including the number of RBM layers, the number of neurons in each layer, the number of iterations, the learning rate, and the number of reverse fine-tunings.
Step 6. We set the parameters of SSA, including the number of optimization parameters, the ratio of discoverers to joiners, and the safety threshold of the optimization parameter value.
Step 7. The DBN randomly generates the initial weights based on the safety threshold. The SSA algorithm updates the positions of the warning values of discoverers and joiners based on (10) and (11), and (12), assigns the updated parameter values to the DBN model, and iteratively updates the values of the new fitness function.
Step 8. We determine whether the termination condition is satisfied and whether the fitness function is the current optimum. If not, return to Step 6, otherwise, proceed to Step 9.
Step 9. Finally, we obtain the minimum value of fitness function value and determine the DBN parameters, i.e., the optimal weight parameters of the DBN model.
Step 10. The trained model is evaluated based on the test set.
The flowchart of the proposed algorithm is presented in
Figure 5.