Despite substantial progress in 3D human pose estimation from a single-view image, prior works rarely explore global and local correlations, leading to insufficient learning of human skeleton representations. To address this issue, we propose a novel Interweaved Graph and Attention Network (IGANet) that allows bidirectional communications between graph convolutional networks (GCNs) and attentions. Specifically, we introduce an IGA module, where attentions are provided with local information from GCNs and GCNs are injected with global information from attentions. Additionally, we design a simple yet effective U-shaped multi-layer perceptron (uMLP), which can capture multi-granularity information for body joints. Extensive experiments on two popular benchmark datasets (i.e. Human3.6M and MPI-INF-3DHP) are conducted to evaluate our proposed method.The results show that IGANet achieves state-of-the-art performance on both datasets. Code is available at https://github.com/xiu-cs/IGANet.
翻译:尽管从单视图图像进行三维人体姿态估计已取得显著进展,但先前的研究鲜少探索全局与局部相关性,导致对人体骨骼表征的学习不够充分。为解决此问题,我们提出了一种新颖的交织图与注意力网络(IGANet),该网络支持图卷积网络(GCNs)与注意力机制之间的双向通信。具体而言,我们引入了交织图注意力模块(IGA模块),其中注意力机制从GCNs获取局部信息,而GCNs则注入来自注意力机制的全局信息。此外,我们设计了一种简单而有效的U型多层感知器(uMLP),能够捕捉身体关节的多粒度信息。在两个主流基准数据集(即Human3.6M和MPI-INF-3DHP)上进行了大量实验以评估所提方法。结果表明,IGANet在两个数据集上均达到了当前最优性能。代码已开源:https://github.com/xiu-cs/IGANet。