StructLDM: Structured Latent Diffusion for 3D Human Generation

Recent 3D human generative models have achieved remarkable progress by learning 3D-aware GANs from 2D images. However, existing 3D human generative methods model humans in a compact 1D latent space, ignoring the articulated structure and semantics of human body topology. In this paper, we explore more expressive and higher-dimensional latent space for 3D human modeling and propose StructLDM, a diffusion-based unconditional 3D human generative model, which is learned from 2D images. StructLDM solves the challenges imposed due to the high-dimensional growth of latent space with three key designs: 1) A semantic structured latent space defined on the dense surface manifold of a statistical human body template. 2) A structured 3D-aware auto-decoder that factorizes the global latent space into several semantic body parts parameterized by a set of conditional structured local NeRFs anchored to the body template, which embeds the properties learned from the 2D training data and can be decoded to render view-consistent humans under different poses and clothing styles. 3) A structured latent diffusion model for generative human appearance sampling. Extensive experiments validate StructLDM's state-of-the-art generation performance and illustrate the expressiveness of the structured latent space over the well-adopted 1D latent space. Notably, StructLDM enables different levels of controllable 3D human generation and editing, including pose/view/shape control, and high-level tasks including compositional generations, part-aware clothing editing, 3D virtual try-on, etc. Our project page is at: https://taohuumd.github.io/projects/StructLDM/.

翻译：近期，通过从二维图像学习三维感知生成对抗网络，三维人体生成模型取得了显著进展。然而，现有三维人体生成方法将人体建模在紧凑的一维隐空间中，忽略了人体拓扑的关节结构与语义信息。本文探索了更具表达力且更高维度的隐空间用于三维人体建模，并提出了StructLDM——一种基于扩散模型、从二维图像学习的无条件三维人体生成模型。针对隐空间维度升高带来的挑战，StructLDM通过三项关键设计予以解决：1）在统计人体模板的稠密表面流形上定义语义结构化的隐空间；2）构建结构化三维感知自解码器，将全局隐空间分解为多个语义身体部位，这些部位由一组以人体模板为锚点的条件化结构化局部神经辐射场参数化，其嵌入了从二维训练数据学习到的属性，并可通过解码渲染出在不同姿态与服装风格下视角一致的人体；3）设计结构化隐扩散模型用于生成式人体外观采样。大量实验验证了StructLDM的先进生成性能，并证明了结构化隐空间相较于广泛采用的一维隐空间具有更强的表达能力。值得注意的是，StructLDM支持不同层次的可控三维人体生成与编辑，包括姿态/视角/形状控制，以及组合生成、部件感知服装编辑、三维虚拟试穿等高级任务。项目页面位于：https://taohuumd.github.io/projects/StructLDM/。