Inspired by the success of volumetric 3D pose estimation, some recent human mesh estimators propose to estimate 3D skeletons as intermediate representations, from which, the dense 3D meshes are regressed by exploiting the mesh topology. However, body shape information is lost in extracting skeletons, leading to mediocre performance. The advanced motion capture systems solve the problem by placing dense physical markers on the body surface, which allows to extract realistic meshes from their non-rigid motions. However, they cannot be applied to wild images without markers. In this work, we present an intermediate representation, named virtual markers, which learns 64 landmark keypoints on the body surface based on the large-scale mocap data in a generative style, mimicking the effects of physical markers. The virtual markers can be accurately detected from wild images and can reconstruct the intact meshes with realistic shapes by simple interpolation. Our approach outperforms the state-of-the-art methods on three datasets. In particular, it surpasses the existing methods by a notable margin on the SURREAL dataset, which has diverse body shapes. Code is available at https://github.com/ShirleyMaxx/VirtualMarker.
翻译:受体积三维姿态估计成功的启发,近期一些人体网格估计方法提出将三维骨架作为中间表示,进而利用网格拓扑回归稠密三维网格。然而,骨架提取过程中会丢失体形信息,导致性能平庸。先进的动作捕捉系统通过在体表放置密集物理标记点来解决该问题,从而能从非刚性运动中提取逼真网格。但这些方法无法应用于无标记点的野外图像。本研究提出一种名为"虚拟标记点"的中间表示,通过生成式风格基于大规模动捕数据学习体表64个关键地标点,模拟物理标记点的效果。虚拟标记点可从野外图像中精确检测,并通过简单插值重建具有逼真形态的完整网格。本方法在三个数据集上均超越现有最优方法,尤其在具有多样化体形的SURREAL数据集上显著领先现有方法。代码开源地址:https://github.com/ShirleyMaxx/VirtualMarker。