GaitASMS: Gait Recognition by Adaptive Structured Spatial Representation and Multi-Scale Temporal Aggregation

Gait recognition is one of the most promising video-based biometric technologies. The edge of silhouettes and motion are the most informative feature and previous studies have explored them separately and achieved notable results. However, due to occlusions and variations in viewing angles, their gait recognition performance is often affected by the predefined spatial segmentation strategy. Moreover, traditional temporal pooling usually neglects distinctive temporal information in gait. To address the aforementioned issues, we propose a novel gait recognition framework, denoted as GaitASMS, which can effectively extract the adaptive structured spatial representations and naturally aggregate the multi-scale temporal information. The Adaptive Structured Representation Extraction Module (ASRE) separates the edge of silhouettes by using the adaptive edge mask and maximizes the representation in semantic latent space. Moreover, the Multi-Scale Temporal Aggregation Module (MSTA) achieves effective modeling of long-short-range temporal information by temporally aggregated structure. Furthermore, we propose a new data augmentation, denoted random mask, to enrich the sample space of long-term occlusion and enhance the generalization of the model. Extensive experiments conducted on two datasets demonstrate the competitive advantage of proposed method, especially in complex scenes, i.e. BG and CL. On the CASIA-B dataset, GaitASMS achieves the average accuracy of 93.5\% and outperforms the baseline on rank-1 accuracies by 3.4\% and 6.3\%, respectively, in BG and CL. The ablation experiments demonstrate the effectiveness of ASRE and MSTA. The source code is available at https://github.com/YanSungithub/GaitASMS.

翻译：步态识别是最具前景的基于视频的生物特征识别技术之一。轮廓边缘与运动信息是最具判别力的特征，已有研究分别探索这两类特征并取得了显著成果。然而，由于遮挡和视角变化，基于预定义空间分割策略的传统方法常影响步态识别性能。此外，传统时序池化方法往往忽略了步态中独特的时序信息。为解决上述问题，本文提出一种新型步态识别框架GaitASMS，该框架能够有效提取自适应结构化空间表示，并自然聚合多尺度时序信息。具体而言，自适应结构化表示提取模块（ASRE）通过自适应边缘掩膜分离轮廓边缘信息，并在语义隐空间中最大化特征表示。多尺度时序聚合模块（MSTA）通过时序聚合结构实现对长短程时序信息的有效建模。此外，本文提出一种名为随机掩膜的新型数据增强方法，以丰富长期遮挡的样本空间并提升模型泛化能力。在两个数据集上的大量实验表明，所提方法具有显著优势，尤其在复杂场景（如BG和CL）中表现突出。在CASIA-B数据集上，GaitASMS在BG和CL场景下分别达到93.5%的平均准确率，较基线方法的rank-1准确率提升3.4%和6.3%。消融实验验证了ASRE与MSTA模块的有效性。源代码已开源至https://github.com/YanSungithub/GaitASMS。