Inspired by the success of volumetric 3D pose estimation, some recent human mesh estimators propose to estimate 3D skeletons as intermediate representations, from which, the dense 3D meshes are regressed by exploiting the mesh topology. However, body shape information is lost in extracting skeletons, leading to mediocre performance. The advanced motion capture systems solve the problem by placing dense physical markers on the body surface, which allows to extract realistic meshes from their non-rigid motions. However, they cannot be applied to wild images without markers. In this work, we present an intermediate representation, named virtual markers, which learns 64 landmark keypoints on the body surface based on the large-scale mocap data in a generative style, mimicking the effects of physical markers. The virtual markers can be accurately detected from wild images and can reconstruct the intact meshes with realistic shapes by simple interpolation. Our approach outperforms the state-of-the-art methods on three datasets. In particular, it surpasses the existing methods by a notable margin on the SURREAL dataset, which has diverse body shapes. Code is available at https://github.com/ShirleyMaxx/VirtualMarker.
翻译:受三维体积姿态估计成功的启发,近期一些人体网格估计方法提出将三维骨架作为中间表征,并利用网格拓扑结构从骨架回归稠密三维网格。然而,骨架提取过程中会丢失人体形状信息,导致性能平庸。先进的运动捕捉系统通过在体表放置密集物理标记来解决该问题,从而能够从非刚性运动中提取真实网格。但这些方法无法应用于无标记的自然图像。本文提出一种名为"虚拟标记"的中间表征,该方法基于大规模动作捕捉数据以生成式风格学习体表64个关键点标定点,模拟物理标记的效果。虚拟标记可从自然图像中精确检测,并通过简单插值重建具有真实形态的完整网格。本方法在三个数据集上均超越现有最优方法,特别是在包含多样化人体形态的SURREAL数据集上以显著优势超越现有方法。代码已开源:https://github.com/ShirleyMaxx/VirtualMarker。