Inspired by the success of volumetric 3D pose estimation, some recent human mesh estimators propose to estimate 3D skeletons as intermediate representations, from which, the dense 3D meshes are regressed by exploiting the mesh topology. However, body shape information is lost in extracting skeletons, leading to mediocre performance. The advanced motion capture systems solve the problem by placing dense physical markers on the body surface, which allows to extract realistic meshes from their non-rigid motions. However, they cannot be applied to wild images without markers. In this work, we present an intermediate representation, named virtual markers, which learns 64 landmark keypoints on the body surface based on the large-scale mocap data in a generative style, mimicking the effects of physical markers. The virtual markers can be accurately detected from wild images and can reconstruct the intact meshes with realistic shapes by simple interpolation. Our approach outperforms the state-of-the-art methods on three datasets. In particular, it surpasses the existing methods by a notable margin on the SURREAL dataset, which has diverse body shapes. Code is available at https://github.com/ShirleyMaxx/VirtualMarker
翻译:受体积三维姿态估计成功的启发,近期一些人 体网格估计器提出将三维骨架估计为中间表示,并利用网格拓扑从骨架回归出稠密三维网格。然而,在提取骨架过程中丢失了身体形状信息,导致性能平庸。先进的运动捕捉系统通过在体表布置稠密物理标记来解决该问题,这使得从其非刚性运动中提取真实网格成为可能。但此类方法无法应用于无标记的野外图像。本工作提出一种名为虚拟标记的中间表示,其基于大规模运动捕捉数据以生成式风格学习体表64个关键点,模拟物理标记的效果。虚拟标记可从野外图像中精确检测,并通过简单插值重建具有真实形状的完整网格。我们的方法在三个数据集上优于现有最佳方法。特别是在具有多样身体形状的SURREAL数据集上,其以显著优势超越现有方法。代码发布于https://github.com/ShirleyMaxx/VirtualMarker