3D human pose estimation from a single image is still a challenging problem despite the large amount of work that has been performed in this field. Generally, most methods directly use neural networks and ignore certain constraints (e.g., reprojection constraints, joint angle, and bone length constraints). While a few methods consider these constraints but train the network separately, they cannot effectively solve the depth ambiguity problem. In this paper, we propose a GAN-based model for 3D human pose estimation, in which a reprojection network is employed to learn the mapping of the distribution from 3D poses to 2D poses, and a discriminator is employed for 2D-3D consistency discrimination. We adopt a novel strategy to synchronously train the generator, the reprojection network and the discriminator. Furthermore, inspired by the typical kinematic chain space (KCS) matrix, we introduce a weighted KCS matrix and take it as one of the discriminator's inputs to impose joint angle and bone length constraints. The experimental results on Human3.6M show that our method significantly outperforms state-of-the-art methods in most cases.
翻译:单张图像的3D人体姿态估计仍是一项具有挑战性的问题,尽管该领域已有大量研究工作。通常,大多数方法直接使用神经网络而忽略某些约束(例如重投影约束、关节角度和骨骼长度约束)。少数方法虽考虑这些约束但分别训练网络,无法有效解决深度模糊问题。本文提出一种基于生成对抗网络的3D人体姿态估计模型,其中重投影网络用于学习从3D姿态到2D姿态的分布映射,判别器用于2D-3D一致性判别。我们采用新颖策略同步训练生成器、重投影网络和判别器。此外,受典型运动链空间矩阵启发,引入加权KCS矩阵并将其作为判别器输入之一,以施加关节角度和骨骼长度约束。在Human3.6M上的实验结果表明,我们的方法在多数情况下显著优于现有最优方法。