This work addresses 3D human pose reconstruction in single images. We present a method that combines Forward Kinematics (FK) with neural networks to ensure a fast and valid prediction of 3D pose. Pose is represented as a hierarchical tree/graph with nodes corresponding to human joints that model their physical limits. Given a 2D detection of keypoints in the image, we lift the skeleton to 3D using neural networks to predict both the joint rotations and bone lengths. These predictions are then combined with skeletal constraints using an FK layer implemented as a network layer in PyTorch. The result is a fast and accurate approach to the estimation of 3D skeletal pose. Through quantitative and qualitative evaluation, we demonstrate the method is significantly more accurate than MediaPipe in terms of both per joint positional error and visual appearance. Furthermore, we demonstrate generalization over different datasets. The implementation in PyTorch runs at between 100-200 milliseconds per image (including CNN detection) using CPU only.
翻译:本文探讨了单张图像中的三维人体姿态重建问题。我们提出了一种将正向运动学(FK)与神经网络相结合的方法,以确保快速且有效地预测三维姿态。姿态被表示为树状/图状层级结构,其节点对应人体关节点,并建模了它们的物理限制。给定图像中关键点的二维检测结果,我们利用神经网络将骨骼提升至三维空间,同时预测关节旋转角度与骨骼长度。随后,这些预测结果通过一个以PyTorch网络层形式实现的FK层,与骨骼约束相结合。由此得到一种快速且准确的三维骨骼姿态估计方法。通过定量与定性评估,我们证明该方法在关节位置误差与视觉外观上均显著优于MediaPipe。此外,我们还验证了该方法在不同数据集上的泛化能力。基于PyTorch的实现(仅使用CPU)在每张图像上的运行时间为100-200毫秒(包含CNN检测)。