We propose a 3D generation pipeline that uses diffusion models to generate realistic human digital avatars. Due to the wide variety of human identities, poses, and stochastic details, the generation of 3D human meshes has been a challenging problem. To address this, we decompose the problem into 2D normal map generation and normal map-based 3D reconstruction. Specifically, we first simultaneously generate realistic normal maps for the front and backside of a clothed human, dubbed dual normal maps, using a pose-conditional diffusion model. For 3D reconstruction, we ``carve'' the prior SMPL-X mesh to a detailed 3D mesh according to the normal maps through mesh optimization. To further enhance the high-frequency details, we present a diffusion resampling scheme on both body and facial regions, thus encouraging the generation of realistic digital avatars. We also seamlessly incorporate a recent text-to-image diffusion model to support text-based human identity control. Our method, namely, Chupa, is capable of generating realistic 3D clothed humans with better perceptual quality and identity variety.
翻译:我们提出了一种利用扩散模型生成逼真人体数字化身的三维生成流水线。由于人体身份、姿态及随机细节的高度多样性,三维人体网格生成一直是具有挑战性的问题。为此,我们将该问题分解为二维法线贴图生成与基于法线贴图的三维重建两个子问题。具体而言,我们首先使用姿态条件扩散模型同步生成穿衣人体正面与背面的逼真法线贴图(称为双法线贴图)。在三维重建阶段,我们通过网格优化根据法线贴图对先验SMPL-X网格进行"雕刻",从而获得细节丰富的三维网格。为进一步增强高频细节,我们提出了针对身体和面部区域的扩散重采样方案,从而提升逼真数字化身的生成质量。此外,我们无缝集成了最新的文本反演扩散模型以支持基于文本的人体身份控制。所提方法Chupa能够生成具有更优感知质量与身份多样性的逼真三维穿衣人体。