We present a benchmark for 3D human whole-body pose estimation, which involves identifying accurate 3D keypoints on the entire human body, including face, hands, body, and feet. Currently, the lack of a fully annotated and accurate 3D whole-body dataset results in deep networks being trained separately on specific body parts, which are combined during inference. Or they rely on pseudo-groundtruth provided by parametric body models which are not as accurate as detection based methods. To overcome these issues, we introduce the Human3.6M 3D WholeBody (H3WB) dataset, which provides whole-body annotations for the Human3.6M dataset using the COCO Wholebody layout. H3WB comprises 133 whole-body keypoint annotations on 100K images, made possible by our new multi-view pipeline. We also propose three tasks: i) 3D whole-body pose lifting from 2D complete whole-body pose, ii) 3D whole-body pose lifting from 2D incomplete whole-body pose, and iii) 3D whole-body pose estimation from a single RGB image. Additionally, we report several baselines from popular methods for these tasks. Furthermore, we also provide automated 3D whole-body annotations of TotalCapture and experimentally show that when used with H3WB it helps to improve the performance. Code and dataset is available at https://github.com/wholebody3d/wholebody3d
翻译:我们提出一个用于3D人体全身姿态估计的基准,该任务旨在识别整个人体(包括面部、手部、身体和脚部)上的精确3D关键点。目前,由于缺乏一个完全标注且精确的3D全身数据集,深度网络通常针对特定身体部位进行单独训练,并在推理阶段进行组合;或者依赖于参数化人体模型提供的伪真值,但其精度不如基于检测的方法。为克服这些问题,我们引入了Human3.6M 3D全身(H3WB)数据集,该数据集采用COCO全身布局对Human3.6M数据集提供全身标注。H3WB包含100K张图像上的133个全身关键点标注,这得益于我们新的多视角流程。我们还提出了三个任务:i)从2D完整全身姿态进行3D全身姿态提升,ii)从2D不完整全身姿态进行3D全身姿态提升,以及iii)从单张RGB图像进行3D全身姿态估计。此外,我们报告了针对这些任务的多种流行方法的基线结果。同时,我们还提供TotalCapture数据集的自动3D全身标注,并通过实验表明,与H3WB结合使用有助于提升性能。代码和数据集可在https://github.com/wholebody3d/wholebody3d 获取。