We propose a new self-supervised method for predicting 3D human body pose from a single image. The prediction network is trained from a dataset of unlabelled images depicting people in typical poses and a set of unpaired 2D poses. By minimising the need for annotated data, the method has the potential for rapid application to pose estimation of other articulated structures (e.g. animals). The self-supervision comes from an earlier idea exploiting consistency between predicted pose under 3D rotation. Our method is a substantial advance on state-of-the-art self-supervised methods in training a mapping directly from images, without limb articulation constraints or any 3D empirical pose prior. We compare performance with state-of-the-art self-supervised methods using benchmark datasets that provide images and ground-truth 3D pose (Human3.6M, MPI-INF-3DHP). Despite the reduced requirement for annotated data, we show that the method outperforms on Human3.6M and matches performance on MPI-INF-3DHP. Qualitative results on a dataset of human hands show the potential for rapidly learning to predict 3D pose for articulated structures other than the human body.
翻译:我们提出了一种新的自监督方法,用于从单幅图像预测三维人体姿态。该预测网络通过一个包含典型姿态人物图像的无标注数据集以及一组未配对的二维姿态进行训练。通过最小化对标注数据的需求,该方法具有快速应用于其他铰接结构(如动物)姿态估计的潜力。自监督机制源于早期利用三维旋转下预测姿态一致性的思想。我们的方法在直接从图像训练映射方面取得了显著进步,超越了当前最先进的自监督方法,无需肢体关节约束或任何三维经验姿态先验。我们使用提供图像和真实三维姿态的基准数据集(Human3.6M、MPI-INF-3DHP),与当前最先进的自监督方法进行了性能比较。尽管减少了对标注数据的需求,实验表明该方法在Human3.6M数据集上表现更优,在MPI-INF-3DHP数据集上达到同等性能。对人体手部数据集的定性结果展示了该方法快速学习预测除人体外其他铰接结构三维姿态的潜力。