Gait recognition is a biometric technique that identifies individuals by their unique walking styles, which is suitable for unconstrained environments and has a wide range of applications. While current methods focus on exploiting body part-based representations, they often neglect the hierarchical dependencies between local motion patterns. In this paper, we propose a hierarchical spatio-temporal representation learning (HSTL) framework for extracting gait features from coarse to fine. Our framework starts with a hierarchical clustering analysis to recover multi-level body structures from the whole body to local details. Next, an adaptive region-based motion extractor (ARME) is designed to learn region-independent motion features. The proposed HSTL then stacks multiple ARMEs in a top-down manner, with each ARME corresponding to a specific partition level of the hierarchy. An adaptive spatio-temporal pooling (ASTP) module is used to capture gait features at different levels of detail to perform hierarchical feature mapping. Finally, a frame-level temporal aggregation (FTA) module is employed to reduce redundant information in gait sequences through multi-scale temporal downsampling. Extensive experiments on CASIA-B, OUMVLP, GREW, and Gait3D datasets demonstrate that our method outperforms the state-of-the-art while maintaining a reasonable balance between model accuracy and complexity.
翻译:步态识别是一种通过个体独特行走方式实现身份识别的生物特征技术,适用于非约束环境且具有广泛应用前景。现有方法侧重于利用基于身体部位的表征,但往往忽视了局部运动模式之间的层次化依赖关系。本文提出一种层次化时空表征学习(HSTL)框架,用于从粗粒度到细粒度提取步态特征。该框架首先通过层次聚类分析,从整体身体结构到局部细节恢复多级身体结构;其次设计自适应区域运动提取器(ARME)以学习区域无关的运动特征;随后HSTL采用自上而下的方式堆叠多个ARME,每个ARME对应层次结构中特定的划分层级;同时引入自适应时空池化(ASTP)模块,在不同细节层级捕获步态特征以实现层次化特征映射;最后采用帧级时序聚合(FTA)模块,通过多尺度时间降采样减少步态序列中的冗余信息。在CASIA-B、OUMVLP、GREW和Gait3D数据集上的大量实验表明,本方法在保持模型精度与复杂度合理平衡的同时,性能优于现有最先进方法。