We propose a 3D generation pipeline that uses diffusion models to generate realistic human digital avatars. Due to the wide variety of human identities, poses, and stochastic details, the generation of 3D human meshes has been a challenging problem. To address this, we decompose the problem into 2D normal map generation and normal map-based 3D reconstruction. Specifically, we first simultaneously generate realistic normal maps for the front and backside of a clothed human, dubbed dual normal maps, using a pose-conditional diffusion model. For 3D reconstruction, we ``carve'' the prior SMPL-X mesh to a detailed 3D mesh according to the normal maps through mesh optimization. To further enhance the high-frequency details, we present a diffusion resampling scheme on both body and facial regions, thus encouraging the generation of realistic digital avatars. We also seamlessly incorporate a recent text-to-image diffusion model to support text-based human identity control. Our method, namely, Chupa, is capable of generating realistic 3D clothed humans with better perceptual quality and identity variety.
翻译:我们提出一种利用扩散模型生成逼真数字人体模型的三维生成管线。由于人体身份、姿态及随机细节的多样性,三维人体网格生成一直是极具挑战性的问题。为此,我们将问题分解为二维法线贴图生成和基于法线贴图的三维重建两个子任务。具体而言,我们首先使用姿态条件扩散模型同步生成着装人体正面与背面的逼真法线贴图(称为双法线贴图)。在三维重建阶段,我们通过网格优化根据法线贴图将先验SMPL-X网格"雕刻"成包含细节的三维网格。为进一步增强高频细节,我们提出针对人体与面部区域的扩散重采样方案,从而促进逼真数字人体的生成。此外,我们无缝集成近期提出的文生图扩散模型,支持基于文本的人体身份控制。所提方法Chupa能够生成具有更优感知质量与身份多样性的逼真三维着装人体。