We present PrimDiffusion, the first diffusion-based framework for 3D human generation. Devising diffusion models for 3D human generation is difficult due to the intensive computational cost of 3D representations and the articulated topology of 3D humans. To tackle these challenges, our key insight is operating the denoising diffusion process directly on a set of volumetric primitives, which models the human body as a number of small volumes with radiance and kinematic information. This volumetric primitives representation marries the capacity of volumetric representations with the efficiency of primitive-based rendering. Our PrimDiffusion framework has three appealing properties: 1) compact and expressive parameter space for the diffusion model, 2) flexible 3D representation that incorporates human prior, and 3) decoder-free rendering for efficient novel-view and novel-pose synthesis. Extensive experiments validate that PrimDiffusion outperforms state-of-the-art methods in 3D human generation. Notably, compared to GAN-based methods, our PrimDiffusion supports real-time rendering of high-quality 3D humans at a resolution of $512\times512$ once the denoising process is done. We also demonstrate the flexibility of our framework on training-free conditional generation such as texture transfer and 3D inpainting.
翻译:我们提出PrimDiffusion,这是首个基于扩散框架的三维人体生成方法。将扩散模型应用于三维人体生成面临两大挑战:三维表征的高昂计算成本以及人体关节拓扑结构的复杂性。为解决这些问题,我们的核心思路是直接在体积基元集合上执行去噪扩散过程——将人体建模为多个包含辐射场与运动信息的小型体积单元。这种体积基元表征融合了体积表征的建模能力与基元导向渲染的计算效率。PrimDiffusion框架具有三大优势:1)为扩散模型提供紧凑且富有表现力的参数空间;2)支持引入人体先验的灵活三维表征;3)免解码器渲染实现高效的新视角与新姿态生成。大量实验证明,PrimDiffusion在三维人体生成任务上显著超越现有最优方法。值得注意的是,与基于GAN的方法相比,我们的PrimDiffusion在完成去噪过程后,能以$512\times512$分辨率实时渲染高质量三维人体。我们还展示了该框架在无监督条件生成任务(如纹理迁移与三维修复)中的灵活性。