In this paper, we define and study a new Cloth2Body problem which has a goal of generating 3D human body meshes from a 2D clothing image. Unlike the existing human mesh recovery problem, Cloth2Body needs to address new and emerging challenges raised by the partial observation of the input and the high diversity of the output. Indeed, there are three specific challenges. First, how to locate and pose human bodies into the clothes. Second, how to effectively estimate body shapes out of various clothing types. Finally, how to generate diverse and plausible results from a 2D clothing image. To this end, we propose an end-to-end framework that can accurately estimate 3D body mesh parameterized by pose and shape from a 2D clothing image. Along this line, we first utilize Kinematics-aware Pose Estimation to estimate body pose parameters. 3D skeleton is employed as a proxy followed by an inverse kinematics module to boost the estimation accuracy. We additionally design an adaptive depth trick to align the re-projected 3D mesh better with 2D clothing image by disentangling the effects of object size and camera extrinsic. Next, we propose Physics-informed Shape Estimation to estimate body shape parameters. 3D shape parameters are predicted based on partial body measurements estimated from RGB image, which not only improves pixel-wise human-cloth alignment, but also enables flexible user editing. Finally, we design Evolution-based pose generation method, a skeleton transplanting method inspired by genetic algorithms to generate diverse reasonable poses during inference. As shown by experimental results on both synthetic and real-world data, the proposed framework achieves state-of-the-art performance and can effectively recover natural and diverse 3D body meshes from 2D images that align well with clothing.
翻译:本文定义并研究了一个新的Cloth2Body问题,其目标是从二维服装图像生成三维人体网格。与现有的人体网格恢复问题不同,Cloth2Body需要应对由输入的部分观测性和输出的高度多样性所引发的新兴挑战。具体而言,存在三个关键挑战:第一,如何将人体定位和摆姿到服装中;第二,如何有效估计不同服装类型下的人体形状;第三,如何从二维服装图像生成多样且合理的结果。为此,我们提出了一种端到端框架,能够从二维服装图像中精确估计由姿态和形状参数化的三维人体网格。沿着这一思路,我们首先利用运动学感知的姿态估计方法估计人体姿态参数,以三维骨骼作为中介,结合逆运动学模块以提高估计精度。此外,我们设计了一种自适应深度技巧,通过解耦目标尺寸和相机外参的影响,使重投影后的三维网格与二维服装图像更好地对齐。接着,我们提出了物理信息引导的形状估计方法,基于从RGB图像估计的人体局部测量值预测三维形状参数,这不仅改善了像素级的人体-服装对齐,还支持灵活的用户编辑。最后,我们设计了基于进化的姿态生成方法——一种受遗传算法启发的骨骼移植方法,用于在推理过程中生成多样且合理的姿态。在合成数据和真实数据上的实验结果表明,所提出的框架达到了最优性能,能够从二维图像中有效恢复与服装良好对齐的自然且多样化的三维人体网格。