Gait recognition is a rapidly advancing vision technique for person identification from a distance. Prior studies predominantly employed relatively small and shallow neural networks to extract subtle gait features, achieving impressive successes in indoor settings. Nevertheless, experiments revealed that these existing methods mostly produce unsatisfactory results when applied to newly released in-the-wild gait datasets. This paper presents a unified perspective to explore how to construct deep models for state-of-the-art outdoor gait recognition, including the classical CNN-based and emerging Transformer-based architectures. Consequently, we emphasize the importance of suitable network capacity, explicit temporal modeling, and deep transformer structure for discriminative gait representation learning. Our proposed CNN-based DeepGaitV2 series and Transformer-based SwinGait series exhibit significant performance gains in outdoor scenarios, \textit{e.g.}, about +30\% rank-1 accuracy compared with many state-of-the-art methods on the challenging GREW dataset. This work is expected to further boost the research and application of gait recognition. Code will be available at https://github.com/ShiqiYu/OpenGait.
翻译:步态识别是一种快速发展的远距离人体身份识别视觉技术。现有研究主要采用规模较小且层数较浅的神经网络提取细微步态特征,在室内场景中取得了显著成功。然而实验表明,当这些现有方法应用于新发布的真实场景步态数据集时,多数方法未能取得令人满意的结果。本文提出了统一视角来探索如何构建适用于前沿户外步态识别的深度模型,包括经典CNN架构与新兴Transformer架构。由此,我们强调了适当的网络容量、显式时序建模以及深度Transformer结构对于判别性步态表征学习的重要性。我们提出的基于CNN的DeepGaitV2系列与基于Transformer的SwinGait系列在户外场景中展现出显著的性能提升,例如在具有挑战性的GREW数据集上,与众多现有最优方法相比,rank-1准确率提升约+30%。本研究有望进一步推动步态识别的研究与应用。相关代码将发布于https://github.com/ShiqiYu/OpenGait。