Current unsupervised 2D-3D human pose estimation (HPE) methods do not work in multi-person scenarios due to perspective ambiguity in monocular images. Therefore, we present one of the first studies investigating the feasibility of unsupervised multi-person 2D-3D HPE from just 2D poses alone, focusing on reconstructing human interactions. To address the issue of perspective ambiguity, we expand upon prior work by predicting the cameras' elevation angle relative to the subjects' pelvis. This allows us to rotate the predicted poses to be level with the ground plane, while obtaining an estimate for the vertical offset in 3D between individuals. Our method involves independently lifting each subject's 2D pose to 3D, before combining them in a shared 3D coordinate system. The poses are then rotated and offset by the predicted elevation angle before being scaled. This by itself enables us to retrieve an accurate 3D reconstruction of their poses. We present our results on the CHI3D dataset, introducing its use for unsupervised 2D-3D pose estimation with three new quantitative metrics, and establishing a benchmark for future research.
翻译:当前无监督2D-3D人体姿态估计(HPE)方法因单目图像中的透视模糊性,无法应用于多人场景。为此,我们率先探索了仅从2D姿态实现无监督多人2D-3D HPE的可行性,重点聚焦于人体交互重建。为应对透视模糊问题,我们在前人基础上通过预测相机相对于受试者骨盆的仰角,将预测姿态旋转至与地面平行,同时估算个体间在3D空间中的垂直偏移量。该方法首先独立将每个受试者的2D姿态提升至3D,再将其整合至共享的3D坐标系中,随后根据预测仰角对姿态进行旋转并实施偏移缩放。这一流程本身即可实现姿态的精确3D重建。我们在CHI3D数据集上展示了实验结果,引入三项全新量化指标用于无监督2D-3D姿态估计,并为后续研究建立了基准。