Analyzing and training 3D body posture models depend heavily on the availability of joint labels that are commonly acquired through laborious manual annotation of body joints or via marker-based joint localization using carefully curated markers and capturing systems. However, such annotations are not always available, especially for people performing unusual activities. In this paper, we propose an algorithm that learns to discover 3D keypoints on human bodies from multiple-view images without any supervision or labels other than the constraints multiple-view geometry provides. To ensure that the discovered 3D keypoints are meaningful, they are re-projected to each view to estimate the person's mask that the model itself has initially estimated without supervision. Our approach discovers more interpretable and accurate 3D keypoints compared to other state-of-the-art unsupervised approaches on Human3.6M and MPI-INF-3DHP benchmark datasets.
翻译:三维人体姿态模型的分析与训练在很大程度上依赖于关节标注的可用性,这些标注通常通过耗时的人工标注身体关节,或使用精心设计的标记物与捕捉系统进行基于标记的关节定位来获取。然而,此类标注并非始终可用,尤其对于执行非常规活动的人群。本文提出一种无需任何监督或标签(仅依赖多视图几何提供的约束)即可从多视图图像中学习发现人体三维关键点的算法。为确保发现的三维关键点具有意义,算法将其重新投影至每个视图,以估计模型自身最初通过无监督方式估计的人物掩模。与Human3.6M和MPI-INF-3DHP基准数据集上的其他现有无监督方法相比,我们的方法发现了更可解释且更准确的三维关键点。