语言引导与运动感知的步态表征用于泛化识别 (Language-Guided and Motion-Aware Gait Representation for Generalizable Recognition)

Gait recognition is emerging as a promising technology and an innovative field within computer vision, with a wide range of applications in remote human identification. However, existing methods typically rely on complex architectures to directly extract features from images and apply pooling operations to obtain sequence-level representations. Such designs often lead to overfitting on static noise (e.g., clothing), while failing to effectively capture dynamic motion regions, such as the arms and legs. This bottleneck is particularly challenging in the presence of intra-class variation, where gait features of the same individual under different environmental conditions are significantly distant in the feature space. To address the above challenges, we present a Languageguided and Motion-aware gait recognition framework, named LMGait. To the best of our knowledge, LMGait is the first method to introduce natural language descriptions as explicit semantic priors into the gait recognition task. In particular, we utilize designed gait-related language cues to capture key motion features in gait sequences. To improve cross-modal alignment, we propose the Motion Awareness Module (MAM), which refines the language features by adaptively adjusting various levels of semantic information to ensure better alignment with the visual representations. Furthermore, we introduce the Motion Temporal Capture Module (MTCM) to enhance the discriminative capability of gait features and improve the model's motion tracking ability. We conducted extensive experiments across multiple datasets, and the results demonstrate the significant advantages of our proposed network. Specifically, our model achieved accuracies of 88.5%, 97.1%, and 97.5% on the CCPG, SUSTech1K, and CASIAB datasets, respectively, achieving state-of-the-art performance. Homepage: https://dingwu1021.github.io/LMGait/

翻译：步态识别作为一种新兴技术及计算机视觉领域的创新方向，在远程身份识别中展现出广阔的应用前景。然而，现有方法通常依赖复杂架构直接从图像中提取特征，并通过池化操作获得序列级表征。此类设计易导致模型对静态噪声（如衣着）过拟合，同时难以有效捕捉手臂、腿部等动态运动区域。这一瓶颈在存在类内差异时尤为突出——同一个体在不同环境条件下的步态特征在特征空间中相距甚远。为应对上述挑战，我们提出一种语言引导与运动感知的步态识别框架LMGait。据我们所知，LMGait是首个将自然语言描述作为显式语义先验引入步态识别任务的方法。具体而言，我们利用设计的步态相关语言提示来捕捉步态序列中的关键运动特征。为提升跨模态对齐能力，我们提出运动感知模块（MAM），通过自适应调整多层级语义信息来精炼语言特征，确保其与视觉表征更好对齐。此外，我们引入运动时序捕获模块（MTCM）以增强步态特征的判别能力，并提升模型的运动跟踪性能。我们在多个数据集上进行了广泛实验，结果表明所提网络具有显著优势：在CCPG、SUSTech1K和CASIAB数据集上分别达到88.5%、97.1%和97.5%的准确率，实现了最先进的性能。项目主页：https://dingwu1021.github.io/LMGait/