3D human generation is increasingly significant in various applications. However, the direct use of 2D generative methods in 3D generation often results in losing local details, while methods that reconstruct geometry from generated images struggle with global view consistency. In this work, we introduce Joint2Human, a novel method that leverages 2D diffusion models to generate detailed 3D human geometry directly, ensuring both global structure and local details. To achieve this, we employ the Fourier occupancy field (FOF) representation, enabling the direct generation of 3D shapes as preliminary results with 2D generative models. With the proposed high-frequency enhancer and the multi-view recarving strategy, our method can seamlessly integrate the details from different views into a uniform global shape. To better utilize the 3D human prior and enhance control over the generated geometry, we introduce a compact spherical embedding of 3D joints. This allows for an effective guidance of pose during the generation process. Additionally, our method can generate 3D humans guided by textual inputs. Our experimental results demonstrate the capability of our method to ensure global structure, local details, high resolution, and low computational cost simultaneously. More results and the code can be found on our project page at http://cic.tju.edu.cn/faculty/likun/projects/Joint2Human.
翻译:三维人体生成在各类应用中的重要性日益凸显。然而,直接将二维生成方法应用于三维生成常导致局部细节丢失,而基于生成图像重建几何的方法则难以保证全局视角一致性。本文提出Joint2Human,一种利用二维扩散模型直接生成细节化三维人体几何的新方法,确保全局结构与局部细节的兼顾。为实现这一目标,我们采用傅里叶占据场表示,使二维生成模型能直接输出三维形状作为初步结果。通过提出的高频增强器与多视角重雕刻策略,本方法能够将不同视角的细节无缝整合到统一的全局形状中。为更好地利用三维人体先验知识并增强对生成几何的控制,我们引入了三维关节的紧凑球面嵌入,从而在生成过程中实现对姿态的有效引导。此外,本方法还能根据文本输入生成三维人体。实验结果表明,我们的方法能够同时保证全局结构、局部细节、高分辨率与低计算成本。更多结果与代码详见项目页面:http://cic.tju.edu.cn/faculty/likun/projects/Joint2Human。