Recent text-to-3D methods employing diffusion models have made significant advancements in 3D human generation. However, these approaches face challenges due to the limitations of text-to-image diffusion models, which lack an understanding of 3D structures. Consequently, these methods struggle to achieve high-quality human generation, resulting in smooth geometry and cartoon-like appearances. In this paper, we propose HumanNorm, a novel approach for high-quality and realistic 3D human generation. The main idea is to enhance the model's 2D perception of 3D geometry by learning a normal-adapted diffusion model and a normal-aligned diffusion model. The normal-adapted diffusion model can generate high-fidelity normal maps corresponding to user prompts with view-dependent and body-aware text. The normal-aligned diffusion model learns to generate color images aligned with the normal maps, thereby transforming physical geometry details into realistic appearance. Leveraging the proposed normal diffusion model, we devise a progressive geometry generation strategy and a multi-step Score Distillation Sampling (SDS) loss to enhance the performance of 3D human generation. Comprehensive experiments substantiate HumanNorm's ability to generate 3D humans with intricate geometry and realistic appearances. HumanNorm outperforms existing text-to-3D methods in both geometry and texture quality. The project page of HumanNorm is https://humannorm.github.io/.
翻译:近期基于扩散模型的文本到三维方法在三维人体生成领域取得了显著进展。然而,由于文本到图像扩散模型缺乏对三维结构的理解,这些方法面临挑战,难以实现高质量的人体生成,导致几何结构平滑且呈现卡通化外观。本文提出HumanNorm——一种用于高质量逼真三维人体生成的新方法。核心思想是通过学习法线适配扩散模型和法线对齐扩散模型来增强模型对三维几何的二维感知能力。法线适配扩散模型能够根据包含视角依赖和身体感知文本的用户提示生成高保真法线贴图;法线对齐扩散模型则学习生成与法线贴图对齐的彩色图像,从而将物理几何细节转化为逼真外观。借助所提出的法线扩散模型,我们设计了渐进式几何生成策略和多步分数蒸馏采样(SDS)损失函数以提升三维人体生成性能。综合实验证明,HumanNorm能够生成具有精细几何结构和逼真外观的三维人体,在几何与纹理质量上均优于现有文本到三维方法。项目页面:https://humannorm.github.io/。