Parsing is All You Need for Accurate Gait Recognition in the Wild

Binary silhouettes and keypoint-based skeletons have dominated human gait recognition studies for decades since they are easy to extract from video frames. Despite their success in gait recognition for in-the-lab environments, they usually fail in real-world scenarios due to their low information entropy for gait representations. To achieve accurate gait recognition in the wild, this paper presents a novel gait representation, named Gait Parsing Sequence (GPS). GPSs are sequences of fine-grained human segmentation, i.e., human parsing, extracted from video frames, so they have much higher information entropy to encode the shapes and dynamics of fine-grained human parts during walking. Moreover, to effectively explore the capability of the GPS representation, we propose a novel human parsing-based gait recognition framework, named ParsingGait. ParsingGait contains a Convolutional Neural Network (CNN)-based backbone and two light-weighted heads. The first head extracts global semantic features from GPSs, while the other one learns mutual information of part-level features through Graph Convolutional Networks to model the detailed dynamics of human walking. Furthermore, due to the lack of suitable datasets, we build the first parsing-based dataset for gait recognition in the wild, named Gait3D-Parsing, by extending the large-scale and challenging Gait3D dataset. Based on Gait3D-Parsing, we comprehensively evaluate our method and existing gait recognition methods. The experimental results show a significant improvement in accuracy brought by the GPS representation and the superiority of ParsingGait. The code and dataset are available at https://gait3d.github.io/gait3d-parsing-hp .

翻译：二进制轮廓和基于关键点的骨架表征在步态识别研究中占据主导地位数十年，因其易于从视频帧中提取。尽管这些表征在实验室环境的步态识别中取得了成功，但由于其步态表征的信息熵较低，在真实场景中通常会失效。为实现野外环境下的高精度步态识别，本文提出一种新颖的步态表征——步态解析序列（Gait Parsing Sequence, GPS）。GPS是从视频帧中提取的细粒度人体分割序列（即人体解析），因此具有更高的信息熵，能够编码行走过程中细粒度人体部位的形态与动态。此外，为充分挖掘GPS表征的能力，我们提出一种基于人体解析的步态识别框架——ParsingGait。ParsingGait包含基于卷积神经网络（CNN）的主干网络和两个轻量级头部模块：第一个头部从GPS中提取全局语义特征，第二个头部通过图卷积网络学习部位级特征的互信息，从而建模人体行走的详细动态。同时，针对缺乏合适数据集的问题，我们通过扩展大规模且具有挑战性的Gait3D数据集，构建了首个用于野外步态识别的解析型数据集——Gait3D-Parsing。基于Gait3D-Parsing，我们全面评估了所提方法与现有步态识别方法。实验结果表明，GPS表征显著提升了识别精度，并验证了ParsingGait的优越性能。代码与数据集已开源：https://gait3d.github.io/gait3d-parsing-hp