Recent 2D-to-3D human pose estimation works tend to utilize the graph structure formed by the topology of the human skeleton. However, we argue that this skeletal topology is too sparse to reflect the body structure and suffer from serious 2D-to-3D ambiguity problem. To overcome these weaknesses, we propose a novel graph convolution network architecture, Hierarchical Graph Networks (HGN). It is based on denser graph topology generated by our multi-scale graph structure building strategy, thus providing more delicate geometric information. The proposed architecture contains three sparse-to-fine representation subnetworks organized in parallel, in which multi-scale graph-structured features are processed and exchange information through a novel feature fusion strategy, leading to rich hierarchical representations. We also introduce a 3D coarse mesh constraint to further boost detail-related feature learning. Extensive experiments demonstrate that our HGN achieves the state-of-the art performance with reduced network parameters. Code is released at https://github.com/qingshi9974/BMVC2021-Hierarchical-Graph-Networks-for-3D-Human-Pose-Estimation.
翻译:近期从2D到3D的人体姿态估计工作倾向于利用人体骨骼拓扑形成的图结构。然而,我们认为这种骨骼拓扑过于稀疏,难以反映身体结构,并存在严重的2D到3D歧义问题。为克服这些不足,我们提出了一种新颖的图卷积网络架构——分层图网络(HGN)。该架构基于我们多尺度图结构构建策略生成的密度更高的图拓扑,从而提供更精细的几何信息。所提出的架构包含三个并行组织的从稀疏到精细的表示子网络,在这些子网络中,多尺度的图结构特征被处理,并通过一种新颖的特征融合策略交换信息,从而产生丰富的分层表示。我们还引入了3D粗糙网格约束,以进一步促进与细节相关的特征学习。大量实验表明,我们的HGN以更少的网络参数达到了最先进的性能。代码已发布在https://github.com/qingshi9974/BMVC2021-Hierarchical-Graph-Networks-for-3D-Human-Pose-Estimation。