Person re-identification (re-ID) via 3D skeleton data is an emerging topic with prominent advantages. Existing methods usually design skeleton descriptors with raw body joints or perform skeleton sequence representation learning. However, they typically cannot concurrently model different body-component relations, and rarely explore useful semantics from fine-grained representations of body joints. In this paper, we propose a generic Transformer-based Skeleton Graph prototype contrastive learning (TranSG) approach with structure-trajectory prompted reconstruction to fully capture skeletal relations and valuable spatial-temporal semantics from skeleton graphs for person re-ID. Specifically, we first devise the Skeleton Graph Transformer (SGT) to simultaneously learn body and motion relations within skeleton graphs, so as to aggregate key correlative node features into graph representations. Then, we propose the Graph Prototype Contrastive learning (GPC) to mine the most typical graph features (graph prototypes) of each identity, and contrast the inherent similarity between graph representations and different prototypes from both skeleton and sequence levels to learn discriminative graph representations. Last, a graph Structure-Trajectory Prompted Reconstruction (STPR) mechanism is proposed to exploit the spatial and temporal contexts of graph nodes to prompt skeleton graph reconstruction, which facilitates capturing more valuable patterns and graph semantics for person re-ID. Empirical evaluations demonstrate that TranSG significantly outperforms existing state-of-the-art methods. We further show its generality under different graph modeling, RGB-estimated skeletons, and unsupervised scenarios.
翻译:行人重识别(re-ID)通过3D骨架数据是一个新兴的研究方向,具有显著优势。现有方法通常利用原始身体关节点设计骨架描述符,或进行骨架序列表示学习。然而,它们往往无法同时对不同身体组件关系进行建模,且很少从身体关节点的细粒度表示中挖掘有用语义。本文提出一种通用的基于Transformer的骨架图原型对比学习(TranSG)方法,结合结构轨迹提示重建,从骨架图中全面捕获骨架关系及有价值的时空语义用于行人重识别。具体而言,我们首先设计骨架图Transformer(SGT)同步学习骨架图中的身体与运动关系,从而将关键关联节点特征聚合为图表示。随后提出图原型对比学习(GPC)挖掘每个身份的最典型图特征(图原型),并从骨架与序列两个层面对比图表示与不同原型间的内在相似性,以学习判别性图表示。最后,提出图结构轨迹提示重建(STPR)机制,利用图节点的空间与时间上下文促进骨架图重建,从而更有效地捕获行人重识别所需的模式与图语义。实验评估表明,TranSG显著优于现有最先进方法。我们进一步展示了其在不同图建模、RGB估计骨架及无监督场景下的通用性。