Gait recognition is a rapidly advancing vision technique for person identification from a distance. Prior studies predominantly employed relatively shallow networks to extract subtle gait features, achieving impressive successes in constrained settings. Nevertheless, experiments revealed that existing methods mostly produce unsatisfactory results when applied to newly released real-world gait datasets. This paper presents a unified perspective to explore how to construct deep models for state-of-the-art outdoor gait recognition, including the classical CNN-based and emerging Transformer-based architectures. Specifically, we challenge the stereotype of shallow gait models and demonstrate the superiority of explicit temporal modeling and deep transformer structure for discriminative gait representation learning. Consequently, the proposed CNN-based DeepGaitV2 series and Transformer-based SwinGait series exhibit significant performance improvements on Gait3D and GREW. As for the constrained gait datasets, the DeepGaitV2 series also reaches a new state-of-the-art in most cases, convincingly showing its practicality and generality. The source code is available at https://github.com/ShiqiYu/OpenGait.
翻译:步态识别是一种快速发展的远距离人体身份识别视觉技术。以往研究主要采用相对浅层的网络提取精细步态特征,在受限场景中取得了显著成效。然而实验表明,现有方法在应用于新发布的实际步态数据集时,大多表现欠佳。本文提出统一视角,探索如何构建面向最新户外步态识别的深度模型,包括经典的基于CNN架构与新兴的基于Transformer架构。具体而言,我们挑战了浅层步态模型的传统认知,论证了显式时序建模与深度Transformer结构在判别性步态表征学习中的优越性。最终,基于CNN的DeepGaitV2系列与基于Transformer的SwinGait系列在Gait3D与GREW数据集上展现出显著的性能提升。在受限步态数据集中,DeepGaitV2系列在多数情况下也达到了最新最优水平,充分证明了其实用性与通用性。源代码已开源:https://github.com/ShiqiYu/OpenGait。