Graph convolutional networks and their variants have shown significant promise in 3D human pose estimation. Despite their success, most of these methods only consider spatial correlations between body joints and do not take into account temporal correlations, thereby limiting their ability to capture relationships in the presence of occlusions and inherent ambiguity. To address this potential weakness, we propose a spatio-temporal network architecture composed of a joint-mixing multi-layer perceptron block that facilitates communication among different joints and a graph weighted Jacobi network block that enables communication among various feature channels. The major novelty of our approach lies in a new weighted Jacobi feature propagation rule obtained through graph filtering with implicit fairing. We leverage temporal information from the 2D pose sequences, and integrate weight modulation into the model to enable untangling of the feature transformations of distinct nodes. We also employ adjacency modulation with the aim of learning meaningful correlations beyond defined linkages between body joints by altering the graph topology through a learnable modulation matrix. Extensive experiments on two benchmark datasets demonstrate the effectiveness of our model, outperforming recent state-of-the-art methods for 3D human pose estimation.
翻译:图卷积网络及其变体在三维人体姿态估计中展现出显著潜力。尽管取得了成功,但这些方法多数仅考虑人体关节点间的空间相关性,忽视了时间相关性,从而限制了其在遮挡与固有歧义场景下捕捉关系的能力。针对这一潜在缺陷,我们提出一种时空网络架构,包含促进不同关节点间通信的关节混合多层感知机模块,以及实现多特征通道间交互的图加权雅可比网络模块。本方法的核心创新在于通过隐式公平化图滤波推导出新型加权雅可比特征传播规则。我们利用二维姿态序列中的时序信息,并将权重调制融入模型,以实现不同节点特征变换的解耦。同时采用邻接调制机制,通过可学习调制矩阵改变图拓扑结构,学习超越预定义关节点连接的有意义关联。在两个基准数据集上的大量实验表明,本模型性能超越近期最先进的三维人体姿态估计方法,验证了其有效性。