Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-Identification

from arxiv, Accepted by International Journal of Computer Vision (IJCV). Codes are available at https://github.com/Kali-Hac/Hi-MPC. Supplemental materials will be included in the published version

With rapid advancements in depth sensors and deep learning, skeleton-based person re-identification (re-ID) models have recently achieved remarkable progress with many advantages. Most existing solutions learn single-level skeleton features from body joints with the assumption of equal skeleton importance, while they typically lack the ability to exploit more informative skeleton features from various levels such as limb level with more global body patterns. The label dependency of these methods also limits their flexibility in learning more general skeleton representations. This paper proposes a generic unsupervised Hierarchical skeleton Meta-Prototype Contrastive learning (Hi-MPC) approach with Hard Skeleton Mining (HSM) for person re-ID with unlabeled 3D skeletons. Firstly, we construct hierarchical representations of skeletons to model coarse-to-fine body and motion features from the levels of body joints, components, and limbs. Then a hierarchical meta-prototype contrastive learning model is proposed to cluster and contrast the most typical skeleton features ("prototypes") from different-level skeletons. By converting original prototypes into meta-prototypes with multiple homogeneous transformations, we induce the model to learn the inherent consistency of prototypes to capture more effective skeleton features for person re-ID. Furthermore, we devise a hard skeleton mining mechanism to adaptively infer the informative importance of each skeleton, so as to focus on harder skeletons to learn more discriminative skeleton representations. Extensive evaluations on five datasets demonstrate that our approach outperforms a wide variety of state-of-the-art skeleton-based methods. We further show the general applicability of our method to cross-view person re-ID and RGB-based scenarios with estimated skeletons.

翻译：随着深度传感器和深度学习的快速发展，基于骨骼的行人重识别模型近期取得了显著进展并展现出诸多优势。现有方法通常假设各骨骼节点具有同等重要性，仅从身体关节点学习单层骨骼特征，缺乏从肢体等具有更全局身体模式的层次中挖掘更具信息性骨骼特征的能力。此外，这些方法对标签的依赖性也限制了其学习更通用骨骼表征的灵活性。本文提出一种通用的无监督层次化骨骼元原型对比学习方法，结合困难骨骼挖掘技术，用于基于无标签三维骨骼的行人重识别。首先，我们构建骨骼的层次化表示，从身体关节点、部件和肢体三个层面建模由粗到精的身体与运动特征。随后提出层次化元原型对比学习模型，对来自不同层级骨骼的最典型骨骼特征（"原型"）进行聚类与对比。通过将原始原型经多次同质变换转化为元原型，引导模型学习原型的内在一致性，从而捕获更有效的行人重识别骨骼特征。此外，我们设计困难骨骼挖掘机制，自适应推断每个骨骼的信息重要程度，以便聚焦于更困难的骨骼样本，学习更具判别性的骨骼表征。在五个数据集上的广泛评估表明，本方法优于多种最先进的基于骨骼的方法。我们进一步展示了该方法在跨视角行人重识别及基于估计骨骼的RGB场景中的通用适用性。