With the explosive growth of available training data, single-image 3D human modeling is ahead of a transition to a data-centric paradigm. A key to successfully exploiting data scale is to design flexible models that can be supervised from various heterogeneous data sources produced by different researchers or vendors. To this end, we propose a simple yet powerful paradigm for seamlessly unifying different human pose and shape-related tasks and datasets. Our formulation is centered on the ability -- both at training and test time -- to query any arbitrary point of the human volume, and obtain its estimated location in 3D. We achieve this by learning a continuous neural field of body point localizer functions, each of which is a differently parameterized 3D heatmap-based convolutional point localizer (detector). For generating parametric output, we propose an efficient post-processing step for fitting SMPL-family body models to nonparametric joint and vertex predictions. With this approach, we can naturally exploit differently annotated data sources including mesh, 2D/3D skeleton and dense pose, without having to convert between them, and thereby train large-scale 3D human mesh and skeleton estimation models that considerably outperform the state-of-the-art on several public benchmarks including 3DPW, EMDB, EHF, SSP-3D and AGORA.
翻译:随着可用训练数据的爆炸式增长,单图像三维人体建模正处于向数据驱动范式转型的前沿。成功利用数据规模的关键在于设计灵活的模型,使其能够从不同研究者或供应商产生的各类异构数据源中获得监督。为此,我们提出了一种简洁而强大的范式,用于无缝整合不同的人体姿态与形状相关任务及数据集。我们的方法核心在于——无论是在训练还是测试阶段——能够查询人体体积中的任意点,并获取其在三维空间中的估计位置。我们通过学习一个连续的人体点定位器函数神经场来实现这一目标,其中每个定位器都是基于三维热图的卷积点定位器(检测器),且具有不同的参数化配置。为生成参数化输出,我们提出了一种高效的后处理步骤,将SMPL系列人体模型拟合至非参数化的关节点与顶点预测结果。通过这种方法,我们能够自然地利用包括网格数据、二维/三维骨架数据及密集姿态数据在内的不同标注数据源,无需进行数据格式间的转换,从而训练出大规模的三维人体网格与骨架估计模型。该模型在3DPW、EMDB、EHF、SSP-3D和AGORA等多个公开基准测试中显著超越了现有最优方法。