GaitAE: A Cognitive Model-Based Autoencoding Technique for Gait Recognition

Li, Rui; Li, Huakang; Qiu, Yidan; Ren, Jinchang; Ng, Wing W. Y.; Zhao, Huimin

doi:10.3390/math12172780

Open AccessArticle

GaitAE: A Cognitive Model-Based Autoencoding Technique for Gait Recognition

by

Rui Li

^1,2,

Huakang Li

^2,3,*,

Yidan Qiu

⁴

,

Jinchang Ren

⁵,

Wing W. Y. Ng

³

and

Huimin Zhao

^2,*

¹

College of Fine Arts, Guangdong Polytechnic Normal University, Guangzhou 510665, China

²

Pattern Recognition and Intelligent System Laboratory, School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou 510665, China

³

School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China

⁴

Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, Center for the Study of Applied Psychology, School of Psychology, South China Normal University, Guangzhou 510631, China

⁵

National Subsea Centre, Robert Gordon University, Aberdeen AB21 0BH, UK

^*

Authors to whom correspondence should be addressed.

Mathematics 2024, 12(17), 2780; https://doi.org/10.3390/math12172780

Submission received: 29 July 2024 / Revised: 29 August 2024 / Accepted: 6 September 2024 / Published: 8 September 2024

(This article belongs to the Special Issue Mathematical Methods for Pattern Recognition)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Gait recognition is a long-distance biometric technique with significant potential for applications in crime prevention, forensic identification, and criminal investigations. Existing gait recognition methods typically introduce specific feature refinement modules on designated models, leading to increased parameter volume and computational complexity while lacking flexibility. In response to this challenge, we propose a novel framework called GaitAE. GaitAE efficiently learns gait representations from large datasets and reconstructs gait sequences through an autoencoder mechanism, thereby enhancing recognition accuracy and robustness. In addition, we introduce a horizontal occlusion restriction (HOR) strategy, which introduces horizontal blocks to the original input sequences at random positions during training to minimize the impact of confounding factors on recognition performance. The experimental results demonstrate that our method achieves high accuracy and is effective when applied to existing gait recognition techniques.

Keywords:

gait recognition; biologic recognition; autoencoder; deep learning; computer vision; covariate reduction

MSC:

68T10; 68T07

1. Introduction

Gait recognition, a long-distance technique, holds significant application potential in crime prevention, forensic identification, and criminal investigation [1]. The problem is substantial due to the complexity and diversity of gait patterns in real-world scenarios, where even slight variations can lead to significant challenges in accurate identification. While existing methods perform well on controlled experimental data, their applicability in challenging real-world scenarios is limited. In real-world environments, gait sequences are often non-ideal and fragmented, as depicted in Figure 1. These low-quality sequences can severely hinder efficient gait recognition, emphasizing the need for models to extract and highlight valuable information while mitigating the impact of confounding factors. When confronted with low-quality input sequences, deep neural networks often exhibit instability [2]. Addressing the impact of such sequences on gait recognition remains an open question, as most prevalent models primarily focus on minimizing training errors during sample encoding [3,4,5,6]. Current methods [5,7,8] have made progress by emphasizing relevant information in input sequences. For instance, Wei et al. [5] proposed a multi-scale gait recognition network (GMSN), which integrates a multi-scale parallel convolution network and a local-based horizontal mapping module to prioritize critical gait regions. Hou et al. [7] introduced a gait quality-aware network (GQAN), which explicitly evaluates contour and part quality through quality blocks. Wang et al. [8] proposed cost-effective quality assessment strategies that maximize connected regions and use template matching to remove background noise and unidentifiable outlines, along with alignment techniques for non-standard poses. In addition, relation-based methods engage in spatio-temporal modeling to emphasize biologically plausible data [3]. These methods have made certain progress, but they focus solely on optimizing the final output of gait feature descriptors without considering gait reconstruction, making them vulnerable to noise interference in real-world scenarios. In this paper, we address these challenges by introducing the GaitAE method, which employs an autoencoder (AE) to enhance the reconstruction abilities of gait feature descriptors, significantly improving model performance without increasing computational complexity. Furthermore, we propose the horizontal occlusion restriction (HOR) strategy to minimize the impact from unexpected covariates. These innovations ensure that our approach remains efficient and practical for real-world applications, as they are only introduced during training and do not add extra parameters or computational complexity during the testing phase. Experimental validation on benchmark datasets showcases the effectiveness and robustness of our method. The significance of this work lies in its ability to improve gait recognition accuracy under challenging conditions while maintaining computational efficiency, making it highly applicable for large-scale deployment. Our key contributions are as follows:

(1): Designing an AE mechanism capable of adapting gait feature descriptors to different quality sequences.
(2): Introducing the HOR strategy to enhance model robustness by minimizing the impact of confounding factors.
(3): Extensive experiments on benchmark datasets, including CASIA-B [9], OU-MVLP [10], and SUSTech1K [11], validate the effectiveness and robustness of our proposed method.
(4): Our method demonstrates flexibility and effectiveness when integrated into existing gait recognition methods.

In the following sections, we present the related work in Section 2, describe the proposed GaitAE method in Section 3, detail the datasets and experiments in Section 4, and conclude the paper in Section 5.

2. Related Work

2.1. Video-Based Methods

Current gait recognition methods commonly rely on video-based approaches. Techniques like LSTMs [12,13] and time aggregation [14,15] are used to maintain temporal gait information. For instance, Zhang et al. [12] separated pose and appearance features from RGB gait sequences using an LSTM-based network, while Zhao et al. [13] integrated multi-view features through memory and capsule modules. However, these methods can be computationally inefficient due to unnecessary constraints for temporal sequences [14]. To address this, utilizing gait video sequences directly as inputs and mapping features into horizontal strips for spatio-temporal preservation [14,15] has gained popularity. Approaches diverged into 2D and 3D convolution directions. The former one focuses on modeling spatio-temporal features by treating the human body as parts [3,4,5,15,16,17]. For example, Fan et al. [15] emphasized local short-range spatio-temporal features as discriminative for periodic gaits. On the other hand, 3D convolution extracts gait features from global to local perspectives [18,19,20]. Wei et al. [5] introduced a multi-scale gait recognition network (GMSN) that enhances the discriminative representation of gait by combining parallel convolutional networks with a locally based horizontal mapping module. Furthermore, some multimodal gait recognition methods integrate skeletons and silhouettes to capture rich spatio-temporal features [21,22]. For instance, Peng et al. [22] proposed a bimodal fusion network (BiFusion) combining skeleton and silhouette representations effectively. Building on this, Hus et al. [23] introduced GaitTAKE, a method combining temporal attention-based global and local appearance features with time-aggregated human pose features. Despite advancements in accuracy, these methods do not explicitly address modeling low-quality or non-ideal sequences.

2.2. Gait Quality Restoration Methods

Most gait quality restoration methods derive inspiration from adaptive weighting approaches based on image quality in set-based person reidentification [24]. In gait recognition, Hou et al. [7] introduced a gait quality-aware network (GQAN) which explicitly evaluates the quality of each silhouette and each segment. Wang et al. [8] proposed a series of cost-effective quality assessment strategies, including maximizing connected regions and employing template matching to eliminate background noise and unidentifiable outlines, alongside alignment techniques to handle non-standard poses. Inspired by these methods, we introduce an AE capable of reconstructing both typical and low-quality sequences, enhancing the reconstruction capabilities of gait feature descriptors for both types of sequences.

3. GaitAE

3.1. Gait Autoencoder

For a given gait recognition problem, we follow a three-step method to handle video-based gait recognition tasks, as follows:

f (X^{(n)}) = H (G (h (X^{(n)})))

(1)

where

X^{(n)}

represents the

n^{t h}

input sample, the h is a sequence encoder used to extract frame-level features, the

G

operation is used to aggregate video features from different frames, and

H

is used to learn a discriminative representation of the training data distribution from extracted features. We refer to the operations

G

and

H

as gait refinement.

Equation (1) outlines the standard steps in existing gait recognition methods. Unlike traditional approaches, we incorporate a decoder g within the encoder h to form an autoencoder (AE) framework, illustrated in Figure 2. This AE framework, with

n = 1, \dots, N

, is designed to reconstruct input gait sequences, capturing essential details while filtering out noise. The reconstruction task is crucial for improving gait recognition accuracy and serves as a regularization mechanism to prevent overfitting. The model, by learning both reconstruction and recognition, develops shared representations that are generalizable and discriminative. The AE is typically trained by minimizing the mean squared error (MSE) between the input and output as follows:

R_{M S E} = \frac{1}{N} \sum_{n = 1}^{N} {(X^{(n)} - g (h (X^{(n)})))}^{2}

(2)

The triplet loss is defined as:

L_{t r i p l e t} = \sum_{n = 1}^{N} {[| f (X_{a n c h o r}^{(n)}) - f (X_{p o s}^{(n)}) |^{2} - {| f (X_{a n c h o r}^{(n)}) - f (X_{n e g}^{(n)}) |}^{2} + α]}_{+}

(3)

where

X_{a n c h o r}^{(n)}

,

X_{p o s}^{(n)}

, and

X_{n e g}^{(n)}

represent the anchor, positive, and negative samples for the n-th instance, respectively. The function

f (\cdot)

denotes the embedding function that maps inputs to the feature space, and

α

is a margin that enforces a separation between positive and negative pairs. The triplet loss ensures that the distance between the anchor and the positive sample is smaller than the distance between the anchor and the negative sample by at least the margin

α

. The notation

[\cdot] +

indicates that only positive values are considered, ensuring that the loss is non-negative. By combining the MSE with the triplet loss, the total loss function becomes:

L_{t o t a l} = R_{M S E} + R_{t r i p l e t}

(4)

3.2. Horizontal Occlusion Restriction Strategy

The impact of external changes during walking, such as carrying a bag and wearing a coat, on recognition accuracy is one of the major factors that restrict the practical application of gait recognition. The recognition accuracy of previous studies [3,5,16] is relatively low in these unexpected conditions, possibly due to the lack of an information-sifting ability of these algorithms. We assume that adding occlusion to the area which is likely to be affected by the covariate (e.g., upper body and lower middle body) may allow the algorithm to pay more attention on the other area which is less likely to be affected by covariates (e.g., foot, leg, and head) and thus enhance the information-sifting ability. Existing occlusion strategies mostly use simple rectangular occlusions [1], which tend to separate the human body and result in a lack of correlation between body parts during movement. Unlike these existing strategies, the proposed HOR strategy employs multiple horizontally adjustable rectangular occlusions, which better simulate the complexity of real-world occlusions, thereby maintaining the correlation between body parts and preventing the loss of important features.

For the tth frame in gait sequences with the width and height as

H^{'}

and

W^{'}

, respectively, the matrix of the corresponding horizontal occlusion restriction is:

M_{t} = {[\begin{matrix} 0 & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & 0 \end{matrix}]}_{H_{1} : H_{2}, W^{'}}

(5)

with the 0 in the matrix representing the brightness of the pixel being black. Generally, the HOR is the best when the widths of the occlusion are the same as the original sequences since the occlusion can completely eliminate the covariates in this condition. The height of the occlusion is adjusted by two parameters,

H_{1}

and

H_{2}

(

H_{1} < H_{2}

). Therefore, the occlusion is zero for the area of a height from

H_{1}

to

H_{2}

, and a width of

W^{'}

, as is shown in Figure 3. When only using one occlusion area to eliminate the covariates, the continuous information loss in the height dimension would drop the connection between the remaining body parts. To retain the connection between body parts, we propose to use multiple horizontal occlusion restrictions, as is shown in Figure 3. This is achieved by generating a random number

j = 1, 2, \dots, J

occlusion with the width as

W^{'}

, located from

H_{3}

to

H_{4}

, so as to construct

M_{t}^{j}

, with j as the number of occlusions. The multiple horizontal occlusion restrictions can be represented as follows:

M = {M_{t}^{j} \in^{H_{3} : H_{4}, W^{'}} | j = 1, 2, \dots, J; t = 1, 2, \dots, T; H_{3}, H_{4} \in H_{1}, H_{2}}

(6)

where the four parameters,

H_{1}

,

H_{2}

,

H_{3}

, and

H_{4}

are hyperparameters.

H_{1}

and

H_{2}

are generally set in the middle of the image for better removal of covariates.

4. Experiments

We evaluate the effectiveness of our proposed method through extensive assessments using the CASIA-B dataset [9], OU-MVLP dataset [10], and sustech1k dataset [11], which offer a diverse range of gait samples. In this section, we first introduce these datasets, following which we integrate our method into existing state-of-the-art gait recognition models for comparison with the original models. Furthermore, we conduct comprehensive ablation and parameter sensitivity experiments on the CASIA-B dataset to validate the efficacy of the proposed components. Results are presented in terms of average rank-1 accuracy, excluding identical views.

4.1. Datasets

4.1.1. CASIA-B

The CASIA-B dataset [9] is a comprehensive benchmark for gait analysis, featuring video sequences from 124 subjects captured at 25 frames per second. Each subject has 110 video sequences recorded from 11 different viewing angles (ranging from 0 to 180 in 18 increments) under three walking conditions: normal walking (NM), carrying a bag (BG), and wearing a coat (CL). The complexity of these walking conditions increases from NM to BG to CL.

For our experiments, we employ the large-sample training (LT) setup. In the LT configuration, the first 74 subjects (001–074) are used for training, while the remaining 50 subjects (075–124) are reserved for testing. During the testing phase, the first four sequences under the NM condition (NM #1–4) are used as the gallery set. The remaining six sequences are divided into three probe subsets: NM (NM #5–6), BG (BG #1–2), and CL (CL #1–2).

4.1.2. OU-MVLP

The OU-MVLP dataset [10] is the largest publicly available gait database, featuring video sequences from 10,307 subjects. Each subject has 28 video sequences captured from 14 different views (0, 15, …, 90; 180, 195, …, 270 ), with two samples (indexes #00 and #01) per view. According to the standard protocol, sequences from 5153 subjects are used for training, while the remaining 5154 subjects are reserved for testing. During the testing phase, sequences with index #01 are used as the gallery set, and those with index #00 are used as the probe set.

4.1.3. SUSTech1K

The SUSTech1K dataset [11] is a comprehensive and synchronized multimodal gait dataset that captures a wide range of variations, including normal walking, carrying a bag, changing clothes, different views, object carrying, occlusion, illumination changes, uniform wearing, and using an umbrella. The data are collected using a mobile robot equipped with a 128-beam LiDAR scanner and a monocular camera, allowing for synchronized multimodal data acquisition. The dataset consists of 1050 identities, 25,239 sequences, 763,416 point cloud frames, and 3,075,575 RGB images, each paired with corresponding silhouettes. This dataset serves as a valuable resource for research in the field of gait recognition.

4.2. Training and Testing

During preprocessing, gait silhouette sequences are rescaled to 64 × 44 pixels (height × width) using the technique proposed by Takemura et al. [10]. An Adam optimizer [25] is selected with a learning rate of

1 \times 10^{- 4}

. The margin in the Batch All (BA+) triplet loss function [26] is set to 0.2. In the training stage, the batch size of the input training data is given by

P \times K = 8 \times 12

, where

P

is the number of subjects and

K

is the size of the training sample for each subject within the batch. The number of frames

f = 30

. For the decoder g, we design it with a structure similar to that of the encoder h, utilizing nearest-neighbor upsampling in the upsampling layers.

For CASIA-B and SUSTech1K, the model is trained for 160 k iterations. For OU-MVLP, the model is trained for 550 k iterations. The learning rate is reduced to

1 \times 10^{- 5}

at 450 k iterations. In the test stage, all video frames are selected and utilized as test data.

4.3. Effectiveness of GaitAE

We assess the effectiveness and generalizability of our proposed GaitAE by integrating it into existing models and evaluating its impact on their performance. We introduced GaitAE to other cutting-edge gait recognition models conducted LT on the CASIA-B gait dataset. The models are GaitSet [27], GaitPart [15], GaitGL [18], GaitSlice [3], and GMSN [5]. As seen in Table 1, when GaitAE is introduced, the performance of most existing models surpassed their original version. This shows the generalizability and effectiveness of our proposed GaitAE method. Furthermore, we utilize weights trained on the CASIA-B dataset to reconstruct gait sequence images, as depicted in Figure 4. This demonstrates that our method is capable of reconstructing both typical and low-quality sequences across the CASIA-B, OU-MVLP, and SUSTech1K datasets.

4.4. Ablation Study

As seen in Table 2, the two components of the proposed GaitAE framework, namely the AE and HOG, consistently enhance the rank-1 accuracy across various models and datasets. Comparatively, HOG tends to yield greater improvements than AE. This suggests that addressing occlusion and other covariates during training significantly contributes to better recognition performance. While the extent of these improvements varies, they collectively demonstrate the effectiveness of our method in enhancing existing gait recognition techniques and strengthening their robustness against low-quality sequences.

Furthermore, to assess the impact of the occlusion count J, the occlusion range

H_{1}

and

H_{2}

, and the occlusion height

H_{4} - H_{3}

on recognition accuracy, we conduct ablation experiments on HOR using the GaitSlice model, as shown in Table 3. Group 1 shows that the best performance is achieved when

J = 2

, possibly because using two occlusions is most effective in eliminating the impact of covariates, thereby improving accuracy. Group 2 indicates that the optimal performance is achieved when

H_{1}

and

H_{2}

are set to 12 and 40, respectively, suggesting that this range may contain the most covariates. Group 3 demonstrates that an occlusion height of 4 yields the best results. This is because smaller occlusions are insufficient to eliminate covariates, while larger occlusions may remove useful features. Therefore, we ultimately set

J = 2

,

H_{1} = 12

,

H_{2} = 40

, and

H_{4} - H_{3} = 4

as the final HOR configuration.

4.5. Practicality of GaitAE

In real-world applications of gait recognition, occlusions can be a common challenge. We analyze the average rank-1 accuracy decrease under different settings, as shown in Table 4. The results indicate that models utilizing GaitAE outperform the original models when dealing with square occlusions of side lengths 4, 8, and 16, as shown in Figure 5. These findings emphasize the advantages of GaitAE in occluded scenarios.

4.6. Robustness of GaitAE

To assess the robustness of GaitAE, we introduce Gaussian noise to the CASIA-B dataset in the GaitSlice and GMSN models. The introduction of Gaussian noise allows us to simulate common random interferences present in real-world scenarios. In our specific gait sequence instances, we add Gaussian noise with a mean of 0 and standard deviations selected from the set 0.05, 0.1, 0.15, 0.2, 0.25, 0.3 to each frame. The experimental results, depicted in Figure 6, show varying degrees of accuracy improvement when integrating GaitAE into these models, affirming the robustness of our approach. Notably, when the standard deviation of Gaussian noise is set to 0.200, the rank-1 accuracy of GaitAE improves in both the GaitSlice and GMSN models, reaching 1.4 and 0.9, respectively. This indicates that at moderate noise levels, GaitAE’s reconstruction capability effectively filters out noise while preserving key gait features, demonstrating the model’s optimal robustness.

Furthermore, we compare the performance of the original model with the model incorporating GaitAE under three walking conditions (NM, BG, CL) on the CASIA-B dataset. The experimental results, as shown in Table 5, clearly demonstrate that accuracy improved under all three conditions after the introduction of GaitAE. Notably, as the walking conditions become more complex, from NM to BG to CL, the magnitude of accuracy improvement increases. This is because GaitAE effectively refines subtle gait features under normal conditions, handles partial occlusions caused by carrying a bag, and mitigates the impact of clothing variations, thereby significantly enhancing the model’s robustness and recognition capability in more challenging scenarios.

5. Conclusions and Future Work

In this paper, we introduced GaitAE, a novel framework that significantly enhances gait recognition accuracy and robustness through two key innovations. The first innovation is the use of an autoencoder mechanism to effectively reconstruct and refine gait features, allowing the model to capture essential details while filtering out noise. The second innovation is the horizontal occlusion restriction (HOR) strategy, which improves performance by reducing the impact of covariates through the introduction of a random number of horizontally placed blocks during training. Our experimental results on the CASIA-B, OU-MVLP, and SUSTech1K datasets validate that GaitAE can be effectively integrated into existing silhouette-based gait recognition models. Experiments simulating real-world occlusion and noise demonstrate that GaitAE can also improve performance under these conditions, proving its practicality and robustness.

Moreover, GaitAE is only applied during the training phase, adding no additional parameters or computational complexity during testing, highlighting its potential for practical applications. However, whether GaitAE can be integrated with other modalities, such as pose, skeleton, and point cloud data, remains a direction for future research.

Author Contributions

Conceptualization, H.L., Y.Q. and H.Z.; Methodology, R.L. and H.L.; Validation, H.L., Y.Q., J.R., W.W.Y.N. and H.Z.; Formal analysis, R.L. and H.L.; Investigation, R.L.; Resources, H.L.; Writing—original draft, R.L. and Y.Q.; Writing—review & editing, H.L., Y.Q., J.R. and W.W.Y.N.; Visualization, H.L. and Y.Q.; Supervision, J.R., W.W.Y.N. and H.Z.; Project administration, H.L.; Funding acquisition, R.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (62072122), Guangdong province key construction discipline scientific research ability promotion project (2021ZDJS025), Guangdong Postgraduate Education Innovation Plan Project (2020SFKC054), Special Projects in Key Fields of Ordinary Universities of Guangdong Province (2021ZDZX1087), Key Laboratory of Big Data for Intellectual Property of Guangdong Province (2018B030322016), and the 2023 Open Fund of the Key Laboratory of Big Data for Intellectual Property of Guangdong Province (99104072504).

Data Availability Statement

The CASIA-B dataset is available at http://www.cbsr.ia.ac.cn/english/Gait%20Databases.asp, accessed on 20 July 2024. The OU-MVLP dataset is available at http://www.am.sanken.osaka-u.ac.jp/BiometricDB/GaitMVLP.html, accessed on 20 July 2024. The SUSTech1K dataset is available at https://lidargait.github.io/, accessed on 20 July 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Fan, C.; Liang, J.; Shen, C.; Hou, S.; Huang, Y.; Yu, S. Opengait: Revisiting gait recognition towards better practicality. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 9707–9716. [Google Scholar]
Wang, T.; Ng, W.W.; Pelillo, M.; Kwong, S. Lissa: Localized stochastic sensitive autoencoders. IEEE Trans. Cybern. 2019, 51, 2748–2760. [Google Scholar] [CrossRef] [PubMed]
Li, H.; Qiu, Y.; Zhao, H.; Zhan, J.; Chen, R.; Wei, T.; Huang, Z. Gaitslice: A gait recognition model based on spatio-temporal slice features. Pattern Recognit. 2022, 124, 108453. [Google Scholar] [CrossRef]
Qin, H.; Chen, Z.; Guo, Q.; Wu, Q.J.; Lu, M. Rpnet: Gait recognition with relationships between each body-parts. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 2990–3000. [Google Scholar] [CrossRef]
Wei, T.; Liu, M.; Zhao, H.; Li, H. Gmsn: An efficient multi-scale feature extraction network for gait recognition. Expert Syst. Appl. 2024, 252, 124250. [Google Scholar] [CrossRef]
Huang, X.; Wang, X.; He, B.; He, S.; Liu, W.; Feng, B. Star: Spatio-temporal augmented relation network for gait recognition. IEEE Trans. Biom. Behav. Identity Sci. 2022, 5, 115–125. [Google Scholar] [CrossRef]
Hou, S.; Liu, X.; Cao, C.; Huang, Y. Gait quality aware network: Toward the interpretability of silhouette-based gait recognition. IEEE Trans. Neural Netw. Learn. 2022, 34, 8978–8988. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Hou, S.; Zhang, M.; Liu, X.; Cao, C.; Huang, Y.; Li, P.; Xu, S. Qagait: Revisit gait recognition from a quality perspective. Proc. AAAI Conf. Artif. 2024, 38, 5785–5793. Available online: https://ojs.aaai.org/index.php/AAAI/article/view/28391 (accessed on 24 March 2024). [CrossRef]
Yu, S.; Tan, D.; Tan, T. A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In Proceedings of the IEEE 18th International Conference on Pattern Recognition (ICPR’06), Washington, DC, USA, 20–24 August 2006; Volume 4, pp. 441–444. [Google Scholar]
Takemura, N.; Makihara, Y.; Muramatsu, D.; Echigo, T.; Yagi, Y. Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition. IPSJ Trans. Comput. Vis. Appl. 2018, 10, 1–14. [Google Scholar] [CrossRef]
Shen, C.; Fan, C.; Wu, W.; Wang, R.; Huang, G.Q.; Yu, S. Lidargait: Benchmarking 3D gait recognition with point clouds. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 1054–1063. [Google Scholar]
Zhang, Z.; Tran, L.; Liu, F.; Liu, X. On learning disentangled representations for gait recognition. IEEE Trans. Pattern Anal. Mach. 2020, 44, 345–360. [Google Scholar] [CrossRef] [PubMed]
Zhao, A.; Li, J.; Ahmed, M. Spidernet: A spiderweb graph neural network for multi-view gait recognition. Knowl.-Based Syst. 2020, 206, 106273. [Google Scholar] [CrossRef]
Chao, H.; Wang, K.; He, Y.; Zhang, J.; Feng, J. Gaitset: Cross-view gait recognition through utilizing gait as a deep set. IEEE Trans. Pattern Anal. Mach. 2021, 44, 3467–3478. [Google Scholar] [CrossRef] [PubMed]
Fan, C.; Peng, Y.; Cao, C.; Liu, X.; Hou, S.; Chi, J.; Huang, Y.; Li, Q.; He, Z. Gaitpart: Temporal part-based model for gait recognition. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 14225–14233. [Google Scholar]
Huang, T.; Ben, X.; Gong, C.; Xu, W.; Wu, Q.; Zhou, H. Gaitdan: Cross-view gait recognition via adversarial domain adaptation. IEEE Trans. Circuits Syst. Video Technol. 2024, 1. [Google Scholar] [CrossRef]
Chen, J.; Wang, Z.; Zheng, C.; Zeng, K.; Zou, Q.; Cui, L. Gaitamr: Cross-view gait recognition via aggregated multi-feature representation. Inf. Sci. 2023, 636, 118920. [Google Scholar] [CrossRef]
Lin, B.; Zhang, S.; Yu, X. Gait recognition via effective global-local feature representation and local temporal aggregation. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 14648–14656. [Google Scholar]
Huang, Z.; Xue, D.; Shen, X.; Tian, X.; Li, H.; Huang, J.; Hua, X.-S. 3D local convolutional neural networks for gait recognition. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 14920–14929. [Google Scholar]
Huang, T.; Ben, X.; Gong, C.; Zhang, B.; Yan, R.; Wu, Q. Enhanced spatial-temporal salience for cross-view gait recognition. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6967–6980. [Google Scholar] [CrossRef]
Li, G.; Guo, L.; Zhang, R.; Qian, J.; Gao, S. Transgait: Multimodal-based gait recognition with set transformer. Appl. Intell. 2023, 53, 1535–1547. [Google Scholar] [CrossRef]
Peng, Y.; Ma, K.; Zhang, Y.; He, Z. Learning rich features for gait recognition by integrating skeletons and silhouettes. Multimed. Tools Appl. 2024, 83, 7273–7294. [Google Scholar] [CrossRef]
Hsu, H.-M.; Wang, Y.; Yang, C.-Y.; Hwang, J.-N.; Thuc, H.L.U.; Kim, K.-J. Learning temporal attention based keypoint-guided embedding for gait recognition. IEEE J. Sel. Top. Signal Process. 2023, 17, 689–698. [Google Scholar] [CrossRef]
Liu, Y.; Yan, J.; Ouyang, W. Quality Aware Network for Set to Set Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5790–5799. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Hermans, A.; Beyer, L.; Leibe, B. In defense of the triplet loss for person re-identification. arXiv 2017, arXiv:1703.07737. [Google Scholar]
Chao, H.; He, Y.; Zhang, J.; Feng, J. Gaitset: Regarding Gait as a Set for Cross-View Gait Recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2019; Volume 33, pp. 8126–8133. [Google Scholar]

Figure 1. Common low-quality gait sequences.

Figure 2. The GaitAE framework. The components within the red dotted lines represent the primary innovations of GaitAE.

X^{(n)}

represents the nth input sample. h is the sequence encoder for extracting frame-level features,

G

aggregates video features across frames, and

H

learns a discriminative representation of the training data.

G

and

H

together refine the gait features. ⊕ is addition.

Figure 2. The GaitAE framework. The components within the red dotted lines represent the primary innovations of GaitAE.

X^{(n)}

represents the nth input sample. h is the sequence encoder for extracting frame-level features,

G

aggregates video features across frames, and

H

learns a discriminative representation of the training data.

G

and

H

together refine the gait features. ⊕ is addition.

Figure 3. Horizontal occlusion restriction (HOR).

W^{'}

and

H^{'}

represent the width and height of the frames in gait sequences, respectively.

H_{1}

,

H_{2}

,

H_{3}

, and

H_{4}

are parameters that control the position and size of the horizontal occlusion. (a) Complete gait silhouette. (b–e) Schematic diagrams in HOR with different numbers of occlusions and occlusion heights.

Figure 3. Horizontal occlusion restriction (HOR).

W^{'}

and

H^{'}

represent the width and height of the frames in gait sequences, respectively.

H_{1}

,

H_{2}

,

H_{3}

, and

H_{4}

are parameters that control the position and size of the horizontal occlusion. (a) Complete gait silhouette. (b–e) Schematic diagrams in HOR with different numbers of occlusions and occlusion heights.

Figure 4. The image restoration effect of the proposed method. The input is on the left of the arrow, and the output is on the right.

Figure 5. Square occlusions of side lengths 4, 8, and 16.

Figure 6. Comparison between the original model and the model with the introduction of GaitAE under three walking conditions in CASIA-B: NM, BG, and CL.

Table 1. The average rank-1 accuracies for other models and their combination with GaitAE in CASIA-B, OU-MVLP, and SUSTech1K datasets.

Model	CASIA-B		OU-MVLP		SUSTech1K
Model	Origin	GaitAE	Origin	GaitAE	Origin	GaitAE
GaitSet [27]	84.2	86.4 ↑2.2	87.9	88.3 ↑0.4	48.4	49.0 ↑0.6
GaitPart [15]	88.8	90.3 ↑1.5	88.7	89.6 ↑0.9	47.6	48.1 ↑0.5
GaitGL [18]	91.8	92.6 ↑0.8	89.7	90.0 ↑0.3	47.3	47.6 ↑0.3
GaitSlice [3]	90.2	91.6 ↑1.4	89.3	89.7 ↑0.4	47.8	48.3 ↑0.5
GMSN [5]	93.7	94.1 ↑0.4	89.7	90.1 ↑0.4	50.0	50.3 ↑0.3

Table 2. The average rank-1 accuracies for other models and their combination with AE or HOR in CASIA-B, OU-MVLP, and SUSTech1K datasets.

Model	CASIA-B			OU-MVLP			SUSTech1K
Model	Origin	AE	HOR	Origin	AE	HOR	Origin	AE	HOR
GaitSet [27]	84.2	85.2 ↑1.0	85.7 ↑1.5	87.9	88.0 ↑0.1	88.2 ↑0.3	48.4	48.6 ↑0.2	48.9 ↑0.5
GaitPart [15]	88.8	89.4 ↑0.6	89.4 ↑0.6	88.7	89.1 ↑0.4	89.5 ↑0.8	47.6	47.9 ↑0.3	48.0 ↑0.4
GaitGL [18]	91.8	92.0 ↑0.2	92.2 ↑0.4	89.7	89.8 ↑0.1	89.9 ↑0.2	47.3	47.4 ↑0.1	47.5 ↑0.2
GaitSlice [3]	90.2	90.7 ↑0.5	91.2 ↑1.0	89.3	89.6 ↑0.3	89.6 ↑0.3	47.8	48.1 ↑0.3	48.2 ↑0.4
GMSN [5]	93.7	94.0 ↑0.3	93.9 ↑0.2	89.7	89.8 ↑0.1	89.0 ↑0.2	50.0	50.1 ↑0.1	50.1 ↑0.1

Table 3. The impact of the number of occlusions J, the occlusion range

H_{1}

and

H_{2}

, and the occlusion height

H_{4} - H_{3}

on recognition accuracy. Bold indicates the best result.

Table 3. The impact of the number of occlusions J, the occlusion range

H_{1}

and

H_{2}

, and the occlusion height

H_{4} - H_{3}

on recognition accuracy. Bold indicates the best result.

Group	$H_{1}$	$H_{2}$	$H_{4} - H_{3}$	J	Acc
1	12	40	4	1	91.0
	12	40	4	2	91.2
	12	40	4	3	91.1
2	8	44	4	2	90.8
	12	40	4	2	91.2
	16	35	4	2	91.0
3	12	40	2	2	90.9
	12	40	4	2	91.2
	12	40	6	2	90.5

Table 4. The average rank-1 accuracy decreases of different models when adding various occlusions of different sizes (lower is better).

Model	4 × 4		8 × 8		16 × 16
Model	Origin	GaitAE	Origin	GaitAE	Origin	GaitAE
GaitSet [27]	1.9	1.6 ↑0.3	5.4	5.2 ↑0.2	14.3	14.1 ↑0.2
GaitPart [15]	1.5	1.1 ↑0.4	4.4	4.1 ↑0.3	12.4	12.0 ↑0.4
GaitGL [18]	0.6	0.4 ↑0.2	1.7	1.6 ↑0.1	6.5	6.2 ↑0.3
GaitSlice [3]	1.2	1.0 ↑0.2	3.9	3.6 ↑0.3	13.0	12.8 ↑0.2
GMSN [5]	0.3	0.2 ↑0.1	1.2	1.0 ↑0.2	5.9	5.8 ↑0.1

Table 5. The impact of introducing GaitAE on the Rank-1 accuracy of existing methods.

Model	NM		BG		CL
Model	Origin	GaitAE	Origin	GaitAE	Origin	GaitAE
GaitSet [27]	95.0	96.1 ↑1.1	87.2	88.8 ↑1.6	70.4	74.2 ↑3.8
GaitPart [15]	96.0	96.7 ↑0.7	91.5	92.8 ↑1.3	78.7	81.1 ↑2.4
GaitGL [18]	97.4	97.4 ↑0.3	94.5	95.6 ↑0.9	83.6	84.9 ↑1.3
GaitSlice [3]	96.7	97.4 ↑0.7	92.4	93.4 ↑1.0	81.6	84.2 ↑2.6
GMSN [5]	98.2	98.3 ↑0.1	96.0	96.5↑0.5	87.0	87.7 ↑0.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, R.; Li, H.; Qiu, Y.; Ren, J.; Ng, W.W.Y.; Zhao, H. GaitAE: A Cognitive Model-Based Autoencoding Technique for Gait Recognition. Mathematics 2024, 12, 2780. https://doi.org/10.3390/math12172780

AMA Style

Li R, Li H, Qiu Y, Ren J, Ng WWY, Zhao H. GaitAE: A Cognitive Model-Based Autoencoding Technique for Gait Recognition. Mathematics. 2024; 12(17):2780. https://doi.org/10.3390/math12172780

Chicago/Turabian Style

Li, Rui, Huakang Li, Yidan Qiu, Jinchang Ren, Wing W. Y. Ng, and Huimin Zhao. 2024. "GaitAE: A Cognitive Model-Based Autoencoding Technique for Gait Recognition" Mathematics 12, no. 17: 2780. https://doi.org/10.3390/math12172780

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GaitAE: A Cognitive Model-Based Autoencoding Technique for Gait Recognition

Abstract

1. Introduction

2. Related Work

2.1. Video-Based Methods

2.2. Gait Quality Restoration Methods

3. GaitAE

3.1. Gait Autoencoder

3.2. Horizontal Occlusion Restriction Strategy

4. Experiments

4.1. Datasets

4.1.1. CASIA-B

4.1.2. OU-MVLP

4.1.3. SUSTech1K

4.2. Training and Testing

4.3. Effectiveness of GaitAE

4.4. Ablation Study

4.5. Practicality of GaitAE

4.6. Robustness of GaitAE

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI