Current unsupervised 2D-3D human pose estimation (HPE) methods do not work in multi-person scenarios due to perspective ambiguity in monocular images. Therefore, we present one of the first studies investigating the feasibility of unsupervised multi-person 2D-3D HPE from just 2D poses alone, focusing on reconstructing human interactions. To address the issue of perspective ambiguity, we expand upon prior work by predicting the cameras' elevation angle relative to the subjects' pelvis. This allows us to rotate the predicted poses to be level with the ground plane, while obtaining an estimate for the vertical offset in 3D between individuals. Our method involves independently lifting each subject's 2D pose to 3D, before combining them in a shared 3D coordinate system. The poses are then rotated and offset by the predicted elevation angle before being scaled. This by itself enables us to retrieve an accurate 3D reconstruction of their poses. We present our results on the CHI3D dataset, introducing its use for unsupervised 2D-3D pose estimation with three new quantitative metrics, and establishing a benchmark for future research.
翻译:当前无监督的2D-3D人体姿态估计方法因单目图像的透视模糊问题,无法适用于多人场景。为此,我们率先研究了仅基于2D姿态进行无监督多人2D-3D人体姿态估计的可行性,重点聚焦于人体交互重建。为解决透视模糊问题,我们在先前工作基础上进行了扩展,通过预测相机相对于受试者骨盆的仰角,将预测姿态旋转至与地面平行,并估算个体间的3D垂直偏移量。我们的方法先独立将每个受试者的2D姿态提升至3D,再将其整合至共享的3D坐标系中,随后根据预测的仰角进行旋转和偏移,最后进行尺度缩放。这一方法本身即可实现精确的3D姿态重建。我们在CHI3D数据集上展示了实验结果,引入该数据集用于无监督2D-3D姿态估计,并提出了三个新的定量评估指标,为未来研究建立了基准。