2D-to-3D human pose lifting is fundamental for 3D human pose estimation (HPE). Graph Convolutional Network (GCN) has been proven inherently suitable to model the human skeletal topology. However, current GCN-based 3D HPE methods update the node features by aggregating their neighbors' information without considering the interaction of joints in different motion patterns. Although some studies import limb information to learn the movement patterns, the latent synergies among joints, such as maintaining balance in the motion are seldom investigated. We propose a hop-wise GraphFormer with intragroup joint refinement (HopFIR) to tackle the 3D HPE problem. The HopFIR mainly consists of a novel Hop-wise GraphFormer(HGF) module and an Intragroup Joint Refinement(IJR) module which leverages the prior limb information for peripheral joints refinement. The HGF module groups the joints by $k$-hop neighbors and utilizes a hop-wise transformer-like attention mechanism among these groups to discover latent joint synergy. Extensive experimental results show that HopFIR outperforms the SOTA methods with a large margin (on the Human3.6M dataset, the mean per joint position error (MPJPE) is 32.67mm). Furthermore, it is also demonstrated that previous SOTA GCN-based methods can benefit from the proposed hop-wise attention mechanism efficiently with significant performance promotion, such as SemGCN and MGCN are improved by 8.9% and 4.5%, respectively.
翻译:摘要:二维到三维的人体姿态提升是三维人体姿态估计(HPE)的基础。图卷积网络(GCN)已被证明天然适用于建模人体骨骼拓扑结构。然而,当前的GCN三维HPE方法通过聚合邻域信息更新节点特征,未考虑不同运动模式中关节的交互作用。尽管已有研究引入肢体信息以学习运动模式,但关节间的潜在协同作用(如运动中的平衡维持)鲜有探究。本文提出一种基于跳级图变换器与组内联合细化(HopFIR)的方法来解决三维HPE问题。HopFIR主要由新颖的跳级图变换器(HGF)模块和组内联合细化(IJR)模块构成,后者利用先验肢体信息实现外围关节细化。HGF模块通过k跳邻居对关节进行分组,并在组间采用跳级变换器式注意力机制以发现潜在关节协同作用。大量实验结果表明,HopFIR以较大优势优于当前最优方法(在Human3.6M数据集上,平均关节位置误差为32.67mm)。此外,实验还证明,现有基于GCN的最优方法(如SemGCN和MGCN)可通过所提出的跳级注意力机制显著提升性能,分别提升8.9%和4.5%。