3.1. The Gait Motion Tracking System
We use optical motion trackers to record high-precision body motion trace to identify the target person. Before identification, the participants are required to wear a set of optical trackers, and then naturally and straightly walk through a flat test field with a length less than 5 m, which can be done in a very short period of time. There are 76 participants in total, with 46 females and 30 males, and their ages range from 20 to 60. The heights of the participants range from 144 to 178 cm, and the weights ranges from 42 to 115 kg. The sampling frequency of the body locations is 5 Hz, and there are 10 lower-body locations are recorded in each frame, which include thigh, knees, shin, ankle, and tiptoe, and both left and right sides are covered. The obtained walking lengths of the participants are different, which range from 2.37 to 4.15 m. An example of the recorded gait data is shown in
Figure 1, in which 5 gait motion samples of one participant is plotted. Visually, we can see that it is hard to identify the target person without using a proper gait recognition method.
In this paper, we consider a gait recognition problem using high-precision optical gait motion trackers. Before identifying the target, the system previously prepared a set of training dataset with
different persons, denoted as
, where
,
, and
denote the recorded 3D coordinates of the trackers of person
, and
denotes its label. Note that the training data must at least contain a complete walking cycle of the person. Given a new person to be identified, the system record
T consecutive motion frames, denote as
, where
,
, and
denote the 3D coordinates of
-th gait motion frame. When the classifier is properly trained, our problem becomes identifying the target person according to the input motion data
. As shown in
Figure 2, the proposed first-classification-then-fusion method mainly includes the following four steps:
(1) Feature exaction: For each person, the input raw data includes 10 motion tracks, and they cannot be directly used to classify the target object. In this paper, we want to identify the target person with a short gait motion capture trace; extracting the features from a trace recorded from motion tracker time series is not practical in this situation since each person may only has several gait cycles, and the number of collected gait motion is quite limited. As such, a feature exaction process that only extracts relative location distance and speed features from single frame data is proposed to obtain an expressive feature representation of the input data, and then the extracted features will be the input of the following identification process.
(2) Unreliable feature calibration: Though the OMCS can record high precise gait trajectory data, we observe that a recorded motion instance may be biased due to sensing failure or noise interference, and the corresponding the features of the biased motion data are also unreliable. Therefore, it is necessary to detect and calibrate the unreliable feature data, and relief their impact on the classification performance.
(3) Classification: In this paper, we will use a kernel extreme learning machine (KELM) to deal with the classification task for the feature data of each single gait motion frame. We will provide performance comparison results to demonstrate the advantages on classification accuracy and efficiency of KELM for the gait classification in the experimental section.
(4) Decision fusion: The motion frame number of different persons are also different due the variation of walking speed of the target person. With the obtained KELM outputs of all the frames, we then need to combine them into a unified one to obtain the final global decision. Compared with single frame classification, we expect a classification accuracy improvement after combining the decision of multiple frames, and the decision fusion rule will play a vital rule on the final fusion accuracy.
In the next subsections, we will give a detailed illustration of the above four steps, along with their mathematical models.
3.2. Feature Extraction
Given a gait motion data, we use the relative distance of the tracker locations as the feature of the input motion data. The reason is that the relative distance metric can depict both walking action and physical body shape characteristics, which can effectively distinguish the differences between gait motions of two persons. In frame (
t), for two trackers
with coordinates
and
, their Euclidean distance can be computed by
In this way, we can obtain a pairwise distance matrix that contains the relative distance metrics between the 10 optical tackers, in which and . Since is symmetrical, and diagonal elements are all 0, we only use the elements of upper triangle or lower triangle of matrix as the feature representations. Let be the one-dimensional vector that contains all the elements of upper triangle matrix of D, then we can know the feature dimension is .
It has been shown that gait speed can be used as features for human identification and age prediction [
23], thus except the relative distances among the optical trackers, we also include the x-axis speed of each tracker into the features vector. For tracker
, the x-axis coordinates in frames
and
are
and
, then we can estimate the x-axis velocity as follows:
where
denotes the time period between two consecutive gait motion frames. In this paper,
s. In this way, we can obtain a velocity feature vector
, and
. Combined with the relative distance features, we finally obtain the gait feature vector of motion frame
as
. Note that, before the classification step, the features obtained need to be normalized since their magnitudes are different.
3.3. Unreliable Feature Calibration
Due to sensing failure or noise interference, the obtained track data may be biased, and even become outliers. Accordingly, the obtained features also will be unreliable, and will cause negative impacts on the classification performance of the gait recognition task; thus, the unreliable features need to be detected and calibrated. In this paper, we use a hypothesis test method to identify the unreliable features, in which we first estimated the probability density function (PDF) of each feature, then find the unreliable features that is larger or lower than the probability thresholds. Since the feature is irregularly distributed, and cannot be reasonably depicted by one specified distribution, thus we use the kernel density estimation method to estimate the PDFs of the features. Given a feature vector
, where
denotes the number of feature data. The estimated PDF at point
is estimated as follows [
31]
where
denotes that kernel parameter. In this paper, we use radial basis function as the kernel function, i.e.,
, and the bandwidth parameter
is computed by
[
32]. With the obtained PDF, we then can obtain the corresponding cumulative distribution function (CDF) by
. Let
be the probability threshold that decides whether the feature is unreliable or not. In other word, a feature data
will be regarded as unreliable when it satisfies the following condition:
The above criterion requires us to compute the CDF
every time, which is not efficiency enough. We can also compute the corresponding upper bound
and lower bound
with respect to
and
by
and
respectively. In this way, expression (4) is equivalent to
For a feature data , if it is judged as unreliable, in this paper we simply calibrate it as follows if , and if . Note that we do not delete the unreliable because the remained features are still reliable, and a classifier that is strong enough may still can classify the target with calibrated data. In the following multiple frame decision fusion process, the soft decision of a calibrated motion data is still useful for recognizing the target person.
Figure 3 shows an example of the detected unreliable features, in which (a) and (b) plot the tracker distance features and gait speed respectively, (c) and (d) plot the PDFs of (a) and (b), respectively. When the probability threshold
= 99.9%, we will find 1 and 5 outliers in (a) and (b), respectively, and the outlier points are marked with red circles. In this situation, the corresponding features will be to calibrated as their mean values to relief their impacts on the classification performance.
3.4. Classification with Single Gait Motion Frame
With the obtained features, we then use a pattern recognition classifier to identify the target person. The used classifier can be any proper classifier with acceptable classification performance, such as the commonly used support vector machine (SVM), and random forest. In the proposed decision fusion based gait recognition method, the classification performance of each single gait motion frame has critical influences on the following fusion accuracy, thus it is necessary to find a classifier with powerful classification capacities. However, from the test results, we found that the classification performances of SVM and random forest are not satisfactory to us. In this paper, we use KELM as the base classifier for its outstanding classification performance in both accuracy and efficiency. Let
be the features extracted from training motion data, then we can obtain the corresponding kernel Gram matrix
by using the kernel function, in which
is computed by
where
denotes the kernel function with inputs
. In this paper we use Gaussian kernel function, and
is computed by
where
denotes the bandwidth function, usually it can be selected from
. In the training process of KELM classifier, our goal is to obtain the output weight matrix
, which is computed by [
33]
where
denotes the regularization parameter, and it can be selected from
. Note that parameters
and
can be set by different trials, and the values can be chosen as the one with maximal classification performance. Now, given a new input feature data
, we can obtain the corresponding output of the KELM classifier as follows:
Note that is a vector that contains continuous predictions of target data, and we need an additional decision making process to if we want to know the final discrete predicted labels, namely the hard decisions.
Remark 1. A relative larger value of a KELM output means the probability that the target belongs to the corresponding class will be higher. Apparently, a class with maximal output will be regarded the hard decision of the classifier. For a KELM classifier, if an output is closer to −1, then it is more probable that the target does not belong to the corresponding class . On the other hand, if is closer to 1, then it will be more probable that it belongs to the class .
Since we want to combine the decisions of the multiple motion frames, the output will be transformed to fuzzy decisions, and details will be introduced in the following subsection.
3.5. Decision Fusion of Multiple Motion Frames
To combine the decisions of the consecutive motion frames, in this paper we propose a reliability-weighted sum rule (RWS) that adjusting the fuzzy decisions by considering the differences among the fuzzy decisions. In general, a decision is relatively more consistent to other decisions, it is more reliable, otherwise more unreliable. In RMS, the obtained output data of KELM classifier are first transformed to fuzzy decisions by using a fuzzy membership function. More specifically, for frame
, the fuzzy membership that the target person belongs to class
is defined as follows:
where
and
denote the average value and the standard deviation of the
. Parameter
is used for adjusting the discriminative degree of the obtained membership values. A larger value of
will produce a larger span of the membership, and the discriminative degree is also higher. In this paper, we set
in default.
Remark 2. The above fuzzy decision means that, for one classifier output vector, a relative larger output of one class will be transformed to a larger membership compared with other classes, otherwise it will be relatively smaller. In particular, in KELM, we set the class label that the person belongs to as +1, and other class labels as −1. For example, in a classification task with 5 classes, the decision label vector when the person belongs to class 2 is . Given a new gait instance that belongs to class 2 and suppose its output vector is , then the fuzzy decision computed by Equation (11) will be , we can see that a relative larger output value will produce a larger value of fuzzy membership.
With the above membership transformation process, we can obtain the fuzzy decisions of all the motion frames, which are denoted as
. One can directly combine the fuzzy decisions by using some classical decision fusion rules, such as the sum rule, product rule, and majority voting rule [
34]. However, the reliabilities of the decisions are not considered in the above rules, which may decrease the accuracies of the final fusion results. If we can know a reasonable reliability for each fuzzy decision, then the impacts of the misclassified decisions can be reduced, and the classification accuracy of the global fusion results can be improved.
As such, in this paper we propose a reliability estimation method by using the consistency degrees among the fuzzy decisions. In belief function theory, the consistency degree between two basic belief assignments (BBAs)
and
is defined as follows [
34]
Following the above definition, we define the consistency degree between two fuzzy decisions
and
as follows:
Since we do not have compound classes in the above Equation, thus the consistency degree equals to the inner product of
and
, as given by
If the obtained consistency value is relative larger, then we can know that and are more similar with each other, otherwise they are more conflicting with each other.
Remark 3. For a complex classification task with multiple classes, there may exist several outputs with relative larger fuzzy membership values. For example, for a fuzzy decision , in which the memberships of class and are much larger than the other 3 classes, and both of them are probable to be results. Given another output vector , in which the membership of class is much larger than other classes. If the target belongs to class , we can see that, when the reliability weights of the two fuzzy decisions are the same, fuzzy decision will impose a higher negative impact to the final decisions compared with . As such, it is necessary to allocate a relative smaller reliability weight to to avoid misclassification risks.
Remark 4. According to the above definition of decision consistency degree, we can see that, for two fuzzy decisions and , if , i.e., the memberships of are all larger than the corresponding memberships of , we have for any . According to this property, we can see that, if one fuzzy decision has more than one relative larger memberships, its corresponding consistency degree will be more probable larger than a fuzzy decision with only one relative larger membership. For example, given two fuzzy decisions and , then for , we have . This is because indicates that the classifier indicates that both class 2 and 3 are very probable to be the target class, and its consistency degree will be larger.
With above consistency computation method, we can obtain a consistency matrix
that contains pairwise consistency values of among the fuzzy decisions, in which
. It can be expected that, for a fuzzy decision
, if its consistency degrees
to other fuzzy decisions are relative lager values compared with other fuzzy decisions, then we can see it is more consistent to other fuzzy decisions, and its reliability degree should be higher. To achieve this goal, we use the eigenvalue decomposition method (EDM) [
35] to compute the reliability weight of each fuzzy decision. In EDM, we want to compute the eigenvalues
and eigenvectors
of
, which satisfies the following condition:
We can see that one eigenvalue corresponds to a unique eigenvector. The above EDM problem can be properly solved by using the well-known Singular Value Decomposition (SVD) method [
35]. When all the eigenvalues and eigenvectors are obtained, we use the eigenvector
that with the maximal eigenvalue
as the decision weight vector. Note that the obtained eigenvector
can not be directly used as reliability weight if it not normalized. Let
be the normalized weight vector, and it is computed by
where
and
denote the lower bound and upper bound of the normalized weight respectively,
and
denote the maximal and minimal value of
, respectively. In this paper, we set
and
; thus, we have
Remark 5. It has shown that eigenvector can be used as the representation of the importance of each vector in consistency matrix [36,37]. More specifically, a relative larger value of will produce a relative larger eigenvalue of . With this property, eigenvector can be used as the reliability degree of the fuzzy decisions. As mentioned above, a fuzzy decision with larger average consistency value corresponds to a relative larger eigenvalue, and it is more reliable compared with other fuzzy decisions. Next, we can combine all fuzzy decisions into a unified global one by using the obtained reliability weights, as given by
At last, the final decision is made by choosing the class with maximal global membership value, as given by
The above fusion process is suitable for classifying the feature data with different number of gait motion frames. It can be expected that, the fusion accuracy will be increased if the number of the fused decisions (or motion frame number T) is increased. In general, only several gait cycles (e.g., ) will be sufficient to achieve robust fusion accuracy.
The detailed process of the proposed RWS rule is illustrated in the Algorithm 1. We first train the KELM classifier by using Equation (8). Note that, in the proposed method, only KELM is required to be trained, and RWS rule does not need to be trained and it can be directly used for combining the fuzzy decisions. Given a data with several new motion frames, we obtain the outputs of the KELM classifier, and transform the outputs into fuzzy decisions
. Then, we compute the consistency matrix
by using the obtained fuzzy decisions. Subsequently, the eigenvalue decomposition process is conducted to matrix
, and the obtained eigenvector
with the maximal eigenvalue is used for representing the reliability values of the fuzzy decisions. Subsequently, we normalize the eigenvector
into a suitable interval and obtain the reliability vector
, and combine the fuzzy decisions by using a weighted sum combination operation. Finally, the classification result of all fuzzy decisions is chosen as the class with maximal global fuzzy membership.
Algorithm 1 The proposed RWS decision fusion rule. |
Input: Motion frame data , RBF kernel parameter , KELM regularization parameter ; |
Output: Classification result ; |
1: | for |
2: | Compute classification output by using Equation (10); |
3: | Compute fuzzy decisions by using Equation (11); |
4: | end for |
5: | Compute fuzzy decision consistency matrix by using Equation (14); |
6: | Compute eigenvalues and eigenvectors of consistency matrix by using eigenvalue decomposition; |
7: | Find the eigenvector with the maximal eigenvalue ; |
8: | Compute decision reliabilities by using Equation (16); |
9: | Compute global fuzzy decision by using Equation (17); |
10: | Obtain the final classification result by using Equation (18). |
3.6. A Toy Example for Illustrating the Proposed RWS Rule
In this subsection we present a toy example to give a better understanding of the proposed RWS rule. Consider a gait motion recognition problem with 5 possible persons, and suppose that we have 10 consecutive motion frames, and the corresponding fuzzy decisions are shown in
Table 1. In this example, the memberships of class 2 are randomly generated from interval (0.1, 0.8), and the memberships of other classes are randomly generated from interval (0.1, 0.4). We can see that, except
, the membership values of class 2 in other fuzzy decisions are actually not very large. In particular, we can see that in fuzzy decisions
and
, the classes with largest membership values are not class 2. It can be expected that their reliability degrees will be relative smaller than other fuzzy decisions.
Next, we compute the corresponding decision consistency matrix of the fuzzy decisions in
Table 1, and the results are shown in
Table 2, in which
and
denote the
-th column and
-th row, respectively. The corresponding fuzzy decisions are shown in
Table 3. As expected, we can see the reliabilities of fuzzy decisions
and
are the three smallest of the 10 fuzzy decisions, and reliability of
is the largest. In particular, we can see that the reliabilities of
and
, respectively. From this example, we can see that the obtained reliability weight of one fuzzy decision can reasonably reflect its overall consistency to other fuzzy decisions. Finally, with the obtained reliability weights, the global fuzzy decision can be obtained by Equation (17), which is
, which shows that the final decision is class 2.