Monocular 3D human pose estimation from RGB images has attracted significant attention in recent years. However, recent models depend on supervised training with 3D pose ground truth data or known pose priors for their target domains. 3D pose data is typically collected with motion capture devices, severely limiting their applicability. In this paper, we present a heuristic weakly supervised 3D human pose (HW-HuP) solution to estimate 3D poses in when no ground truth 3D pose data is available. HW-HuP learns partial pose priors from 3D human pose datasets and uses easy-to-access observations from the target domain to estimate 3D human pose and shape in an optimization and regression cycle. We employ depth data for weak supervision during training, but not inference. We show that HW-HuP meaningfully improves upon state-of-the-art models in two practical settings where 3D pose data can hardly be obtained: human poses in bed, and infant poses in the wild. Furthermore, we show that HW-HuP retains comparable performance to cutting-edge models on public benchmarks, even when such models train on 3D pose data.
翻译:近年来,从RGB图像进行单目三维人体姿态估计引起了广泛关注。然而,现有模型依赖三维姿态真值数据或目标领域已知姿态先验的监督训练。三维姿态数据通常通过动作捕捉设备采集,这严重限制了其适用性。本文提出一种启发式弱监督三维人体姿态(HW-HuP)解决方案,可在无三维姿态真值数据的情况下估计三维姿态。HW-HuP从三维人体数据集中学习部分姿态先验,并利用目标领域易于获取的观测数据,通过优化与回归循环估计三维人体姿态与形状。我们在训练阶段使用深度数据进行弱监督,但推理阶段不依赖深度数据。实验表明,在两个难以获取三维姿态数据的实际场景(床上人体姿态与野外婴儿姿态)中,HW-HuP相比现有最优模型有显著改进。此外,即使这些现有模型使用三维姿态数据训练,HW-HuP在公开基准测试中仍能保持与之相当的性能。