Gait Recognition

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Jiayi Yan	--	1222	2023-11-26 07:35:26	\|
2	Reference format revised.	Lindsay Dong	Meta information modification	1222	2023-11-27 07:35:10	\|

This entry is adapted from the peer-reviewed paper 10.3390/s23208627

Gait recognition aims to identify a person based on his unique walking pattern. Compared with silhouettes and skeletons, skinned multi-person linear (SMPL) models can simultaneously provide human pose and shape information and are robust to viewpoint and clothing variances.

gait recognition skinned multi-person linear (SMPL)

1. Introduction

Gait describes the walking pattern of a person. Unlike other biometrics, e.g., face, fingerprint, or iris, gait can be observed at a distance without the cooperation of the target. Thus, it is a perfect choice for criminal investigation and social security management. Gait recognition aims to signify the target person by learning his unique walking pattern through video sequences or pictures. It can be categorized into two types by input: appearance-based methods and model-based methods. The advantages and disadvantages of different gait modalities are shown in Table 1. The main-stream appearance-based methods ^[1]^[2]^[3]^[4]^[5] take silhouettes as input and achieve impressive performance on the in-the-lab scenarios ^[6]^[7]. However, being sensitive to clothing and viewpoint variances, these methods are more likely to lose their advantages in the uncontrolled environments. Model-based methods ^[8]^[9]^[10]^[11]^[12] use articulated human body representations (e.g., skeleton and SMPL) as inputs. These methods are robust to the carrying status and clothing variance because they focus on human body structure and movements. Among them, skeletons are sparse representations and are unaware of the human shape information, which is, unfortunately, a critical characteristic for identification. Therefore, a viewpoint-robust and dense representation is needed for accurate gait recognition.

Table 1. Advantages and disadvantages of different gait modalities. The representations are taken from the same person at the same timestamp in the Gait3D dataset.


Type	Silhouette	Skeleton	SMPL
Cloth	Sensitive	Sensitive	Robust
Viewpoint	Sensitive	Robust	Robust
Shape	Easy	Hard	Easy
Space	2D	2D/3D	3D

The presence of the skinned multi-person linear (SMPL) model ^[13] makes it possible to break the above limitations. The SMPL model parameterizes the human mesh by 3D joint angles into low-dimension linear shape space and can implicitly provide dense 3D human mesh information. Hence, it is invariant to viewpoint and clothing interference, making it a more suitable representation of gait recognition. Zheng et al. introduce SMPLGait ^[11], which utilizes a multi-layer perceptron (MLP) network to extract SMPL features and then aggregates the SMPL features with silhouette features for gait recognition. They indicate that incorporating the SMPL modality can improve the accuracy of gait recognition. However, the effectiveness of using SMPL alone, without the additional input of silhouettes, has not been demonstrated yet. The dense shape information provided by silhouettes may impede the network’s ability to excavate the shape information from SMPLs.

2. Graph Structure

Gait recognition can be categorized into two main categories: appearance-based methods and model-based methods. The former typically employs silhouette sequences as inputs, while the latter employs human body models including skeletons and meshes.

2.1. Silhouette-Based Gait Recognition

Silhouette-based methods rely on silhouettes obtained through background subtraction from videos. Early approaches ^[14]^[15]^[16]^[17] use gait energy images (GEIs) as a compressed representation of gait silhouette sequences. Recently, deep CNNs have been applied to learn gait representations ^[2]^[4]^[5]^[18], and demonstrated promising performance. For instance, GaitSet ^[2] proposes a set-pooling technique that regards a sequence of silhouettes as a set, thereby reducing the impact of unnecessary sequence order information. Lin et al. propose GaitGL ^[5] to exploit global and local features from frames. GLN ^[4] merges silhouette-level and set-level features in a top-down manner. Yuki H et al. ^[19] leverage an encoder-decoder structure to deform gait silhouette images from videos. Sheth A et al. ^[20] leverage a convolutional neural network consisting of eight layers to identify human gait. Dou H et al. ^[21] design a framework based on counterfactual intervention learning to focus on the regions that reflect effective walking patterns. Ma K et al. ^[22] propose DANet to simultaneously capture the global gait motion patterns and the local ones. Although silhouettes can provide informative appearance features, they may lose some information regarding the motion patterns and body structures of humans. Consequently, this modality is susceptible to clothing and viewpoint variances, especially for cases in the wild. Additionally, silhouettes are 2D representations and can be sensitive to viewpoints.

2.2. Skeleton-Based Gait Recognition

Skeletons guarantee robustness against variations in clothing and viewpoint in gait recognition. Recent advances in human pose estimation have reached high accuracy, which have made skeleton-based approaches increasingly popular ^[8]^[9]^[10]. Liao et al. propose the pose-based temporal-spatial net (PTSN) ^[23], which leverages pose keypoints for gait recognition. PTSN incorporates a CNN to extract spatial features and an LSTM to extract temporal features. They further generate handcraft features from the skeleton keypoints, including joint angles, bone lengths, and joint motion, and then learn high-level features using a CNN ^[8]. Teepe et al. ^[9] model the human skeleton as a graph and use graph convolutional networks for gait recognition. They also combine higher-order inputs with residual networks ^[10]. Liu et al. ^[24] design a symmetry-driven hyper feature graph convolutional network to automatically learn multiple dynamic patterns and hierarchical semantic features. PoseMapGait ^[25] exploits the pose estimation maps to preserve rich clues of the human body and enhance robustness. Jun et al. ^[26] leverage a composition of the graph convolutional network, the recurrent neural network, and the artificial neural network to encode skeleton sequences, joint angle sequences, and gait parameters. Han et al. ^[27] propose a discontinuous frame screening module for the front end of the feature extraction part, to filter rich information. However, it is challenging to capture global appearance descriptions of a human using skeleton-based approaches.

2.3. SMPL-Based Gait Recognition

The SMPL model can be a compelling modality as it overcomes the limitations of the two modalities discussed above. On the one hand, SMPLs record the keypoints of skeletons, which allows for a focus on the motion pattern of human gait. On the other hand, SMPLs include human shape information, which is crucial in distinguishing between individuals. Furthermore, the human shape information in SMPLs is of low dimension, making it less sensitive to human appearance variations. Li et al. ^[28] propose an end-to-end method for gait recognition through human mesh recovery (HMR), which is the first SMPL-based gait recognition method, and further exploit multi-view constraints to extract more consistent pose sequences ^[12]. However, they do not focus on human gait priors and lack illustrations of real-world performances. Zheng et al. introduce SMPLGait ^[11], which is based on accurate SMPL estimations. They use an MLP network to extract SMPL features and then aggregate them with silhouette features for gait recognition. However, these methods did not fully utilize the articulated characteristics of SMPLs and failed to capture the detailed relationships among joints.

3D Human Reconstruction

The 3D human body can be represented in various ways, such as template parameters, meshes, voxels, UV position maps, and probabilistic outputs ^[29]. Currently, template parameters are the most widely used representation in the research community. A typical type of template parameters is the SMPL model ^[13], which is a vertex-based parametric model. The SMPL factors are deformed into shape and pose parameters. The shape parameters are obtained by performing the principal component analysis (PCA) in a low-dimensional shape space, which helps to prevent the gait recognition network from getting bogged down in silhouette details. The SMPL model depicts minimally clothed humans, allowing for the restoration of human body pose and appearance to a great extent and making it a favorable modality for gait recognition. Additionally, the SMPL family includes other models such as SMPL-X ^[30] and SMPL-H ^[31]. These models extend the SMPL model by including detailed hand poses and facial expressions.

References

Zhang, Z.; Tran, L.; Liu, F.; Liu, X. On learning disentangled representations for gait recognition. IEEE TPAMI 2020, 44, 345–360.
Chao, H.; He, Y.; Zhang, J.; Feng, J. Gaitset: Regarding gait as a set for cross-view gait recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 17 July 2019; Volume 33, pp. 8126–8133.
Fan, C.; Peng, Y.; Cao, C.; Liu, X.; Hou, S.; Chi, J.; Huang, Y.; Li, Q.; He, Z. Gaitpart: Temporal part-based model for gait recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 14225–14233.
Hou, S.; Cao, C.; Liu, X.; Huang, Y. Gait lateral network: Learning discriminative and compact representations for gait recognition. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 382–398.
Lin, B.; Zhang, S.; Yu, X. Gait recognition via effective global-local feature representation and local temporal aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2021, Montreal, QC, Canada, 10–17 October 2021; pp. 14648–14656.
Yu, S.; Tan, D.; Tan, T. A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20 August 2006; IEEE: Piscataway, NJ, USA, 2006; Volume 4, pp. 441–444.
Takemura, N.; Makihara, Y.; Muramatsu, D.; Echigo, T.; Yagi, Y. Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition. IPSJ CVA 2018, 10, 4.
Liao, R.; Yu, S.; An, W.; Huang, Y. A model-based gait recognition method with body pose and human prior knowledge. Pattern Recognit. 2020, 98, 107069.
Teepe, T.; Khan, A.; Gilg, J.; Herzog, F.; Hörmann, S.; Rigoll, G. Gaitgraph: Graph convolutional network for skeleton-based gait recognition. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 2314–2318.
Teepe, T.; Gilg, J.; Herzog, F.; Hörmann, S.; Rigoll, G. Towards a deeper understanding of skeleton-based gait recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA, 19–20 June 2022; pp. 1569–1577.
Zheng, J.; Liu, X.; Liu, W.; He, L.; Yan, C.; Mei, T. Gait recognition in the wild with dense 3d representations and a benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 20228–20237.
Li, X.; Makihara, Y.; Xu, C.; Yagi, Y. End-to-end model-based gait recognition using synchronized multi-view pose constraint. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2021, Montreal, BC, Canada, 11–17 October 2021; pp. 4106–4115.
Loper, M.; Mahmood, N.; Romero, J.; Pons-Moll, G.; Black, M.J. SMPL: A skinned multi-person linear model. ACM TOG 2015, 34, 1–6.
Shiraga, K.; Makihara, Y.; Muramatsu, D.; Echigo, T.; Yagi, Y. Geinet: View-invariant gait recognition using a convolutional neural network. In Proceedings of the 2016 international conference on biometrics (ICB), Halmstad, Sweden, 13 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–8.
Han, J.; Bhanu, B. Individual recognition using gait energy image. IEEE TPAMI 2005, 28, 316–322.
Wu, Z.; Huang, Y.; Wang, L.; Wang, X.; Tan, T. A comprehensive study on cross-view gait based human identification with deep cnns. IEEE TPAMI 2016, 39, 209–226.
Mogan, J.; Lee, C.; Lim, K.; Ali, M.; Alqahtani, A. Gait-CNN-ViT: Multi-Model Gait Recognition with Convolutional Neural Networks and Vision Transformer. Sensors 2023, 23, 3809.
Huang, X.; Zhu, D.; Wang, H.; Wang, X.; Yang, B.; He, B.; Liu, W.; Feng, B. Context-sensitive temporal feature learning for gait recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2021, Montreal, QC, Canada, 10–17 October 2021; pp. 12909–12918.
Hirose, Y.; Nakamura, K.; Nitta, N.; Babaguchi, N. Anonymization of Human Gait in Video Based on Silhouette Deformation and Texture Transfer. IEEE Trans Inf. Forensics Secur. 2022, 17, 3375–3390.
Sheth, A.; Sharath, M.; Reddy, S.C.; Sindhu, K. Gait Recognition Using Convolutional Neural Network. Int. J. Online Biomed. Eng. 2023, 19.
Dou, H.; Zhang, P.; Su, W.; Yu, Y.; Lin, Y.; Li, X. Gaitgci: Generative counterfactual intervention for gait recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023, Vancouver, BC, Canada, 17–24 June 2023; pp. 5578–5588.
Ma, K.; Fu, Y.; Zheng, D.; Cao, C.; Hu, X.; Huang, Y. Dynamic Aggregated Network for Gait Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023, Vancouver, BC, Canada, 17–24 June 2023; pp. 22076–22085.
Liao, R.; Cao, C.; Garcia, E.B.; Yu, S.; Huang, Y. Pose-based temporal-spatial network (PTSN) for gait recognition with carrying and clothing variations. In Proceedings of the Biometric Recognition: 12th Chinese Conference, CCBR 2017, Shenzhen, China, 28–29 October 2017; Proceedings 12. Springer International Publishing: Cham, Switzweland, 2017; pp. 474–483.
Liu, X.; You, Z.; He, Y.; Bi, S.; Wang, J. Symmetry-Driven hyper feature GCN for skeleton-based gait recognition. Pattern Recognit. 2022, 125, 108520.
Liao, R.; Li, Z.; Bhattacharyya, S.; York, G. PoseMapGait: A model-based Gait Recognition Method with Pose Estimation Maps and Graph Convolutional Networks. Neurocomputing 2022, 501, 514–528.
Jun, K.; Lee, K.; Lee, S.; Lee, H.; Kim, M. Hybrid Deep Neural Network Framework Combining Skeleton and Gait Features for Pathological Gait Recognition. Bioengineering 2023, 10, 1133.
Han, K.; Li, X. Research Method of Discontinuous-Gait Image Recognition Based on Human Skeleton Keypoint Extraction. Sensors 2023, 23, 7274.
Li, X.; Makihara, Y.; Xu, C.; Yagi, Y.; Yu, S.; Ren, M. End-to-end model-based gait recognition. In Proceedings of the Asian Conference on Computer Vision 2020, Kyoto, Japan, 30 November–4 December 2020.
Tian, Y.; Zhang, H.; Liu, Y.; Wang, L. Recovering 3d human mesh from monocular images: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023.
Pavlakos, G.; Choutas, V.; Ghorbani, N.; Bolkart, T.; Osman, A.A.; Tzionas, D.; Black, M.J. Expressive body capture: 3d hands, face, and body from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA, 15–20 June 2019; pp. 10975–10985.
Romero, J.; Tzionas, D.; Black, M.J. Embodied hands: Modeling and capturing hands and bodies together. arXiv 2022, arXiv:2201.02610.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Computer Science, Artificial Intelligence

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Jiayi Yan

Shaohui Wang

Jing Lin

Peihao Li

Ruxin Zhang

Haoqian Wang

View Times: 154

Update Date: 27 Nov 2023

Table of Contents

Video Upload Options

Confirm