A long-standing goal of 3D human reconstruction is to create lifelike and fully detailed 3D humans from single images. The main challenge lies in inferring unknown human shapes, clothing, and texture information in areas not visible in the images. To address this, we propose SiTH, a novel pipeline that uniquely integrates an image-conditioned diffusion model into a 3D mesh reconstruction workflow. At the core of our method lies the decomposition of the ill-posed single-view reconstruction problem into hallucination and reconstruction subproblems. For the former, we employ a powerful generative diffusion model to hallucinate back appearances from the input images. For the latter, we leverage skinned body meshes as guidance to recover full-body texture meshes from the input and back-view images. Our designs enable training of the pipeline with only about 500 3D human scans while maintaining its generality and robustness. Extensive experiments and user studies on two 3D reconstruction benchmarks demonstrated the efficacy of our method in generating realistic, fully textured 3D humans from a diverse range of unseen images.
翻译:摘要:三维人体重建的长期目标是从单张图像创建栩栩如生且细节完备的三维人体。其主要挑战在于推断图像不可见区域中未知的人体形状、服装和纹理信息。为此,我们提出SiTH——一种创新性流水线,其独特地将图像条件扩散模型集成至三维网格重建流程中。该方法的核心在于将病态的单视图重建问题分解为"幻觉生成"与"重建"两个子问题:针对前者,我们采用强大的生成式扩散模型从输入图像中幻化出背面外观;针对后者,我们利用蒙皮人体网格作为引导,从输入图像及背面视角图像中恢复全身纹理网格。本设计使得流水线仅需约500个三维人体扫描数据即可完成训练,同时保持其通用性与鲁棒性。在两个三维重建基准上的大量实验与用户研究证明,本方法能从各类未见过的图像中生成逼真且完全纹理化的三维人体。