Gait recognition has achieved promising advances in controlled settings, yet it significantly struggles in unconstrained environments due to challenges such as view changes, occlusions, and varying walking speeds. Additionally, efforts to fuse multiple modalities often face limited improvements because of cross-modality incompatibility, particularly in outdoor scenarios. To address these issues, we present a multi-modal Hierarchy in Hierarchy network (HiH) that integrates silhouette and pose sequences for robust gait recognition. HiH features a main branch that utilizes Hierarchical Gait Decomposer (HGD) modules for depth-wise and intra-module hierarchical examination of general gait patterns from silhouette data. This approach captures motion hierarchies from overall body dynamics to detailed limb movements, facilitating the representation of gait attributes across multiple spatial resolutions. Complementing this, an auxiliary branch, based on 2D joint sequences, enriches the spatial and temporal aspects of gait analysis. It employs a Deformable Spatial Enhancement (DSE) module for pose-guided spatial attention and a Deformable Temporal Alignment (DTA) module for aligning motion dynamics through learned temporal offsets. Extensive evaluations across diverse indoor and outdoor datasets demonstrate HiH's state-of-the-art performance, affirming a well-balanced trade-off between accuracy and efficiency.
翻译:步态识别在受控环境中取得了显著进展,但在非约束场景下仍面临视角变化、遮挡及行走速度差异等挑战,导致性能严重下降。此外,多模态融合尝试常因跨模态不兼容性而收效有限,尤其在户外场景中。针对这些问题,我们提出了一种多模态层级嵌套网络(HiH),该网络融合轮廓与姿态序列以实现鲁棒步态识别。HiH核心分支采用层级步态分解器(HGD)模块,通过跨深度与模块内层级分析,从轮廓数据中挖掘通用步态模式。该方法通过从整体身体动力学到肢体运动细节的运动层级捕获,促进多空间分辨率下的步态属性表征。辅助分支基于二维关节序列,增强步态分析的时空维度:其可变形空间增强(DSE)模块实现姿态引导的空间注意力机制,可变形时间对齐(DTA)模块则通过学习时间偏移量对齐运动动力学。跨室内外多数据集的广泛评估表明,HiH在准确性与效率之间取得了均衡优势,展现了当前最优性能。