Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing generation diversity. In this work, we approach the problem by employing a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model. The multi-view 2.5D diffusion directly models the structural distribution of 3D data, while still maintaining the strong generalization ability of the original 2D diffusion model, filling the gap between 2D diffusion-based and direct 3D diffusion-based methods for 3D content generation. During inference, multi-view normal maps are generated using the 2.5D diffusion, and a novel differentiable rasterization scheme is introduced to fuse the almost consistent multi-view normal maps into a consistent 3D model. We further design a normal-conditioned multi-view image generation module for fast appearance generation given the 3D geometry. Our method is a one-pass diffusion process and does not require any SDS optimization as post-processing. We demonstrate through extensive experiments that, our direct 2.5D generation with the specially-designed fusion scheme can achieve diverse, mode-seeking-free, and high-fidelity 3D content generation in only 10 seconds. Project page: https://nju-3dv.github.io/projects/direct25.
翻译:近年来生成式AI在3D内容创作领域展现出巨大潜力。然而现有方法要么采用耗时得分蒸馏采样(SDS)的预训练2D扩散模型,要么依赖有限3D数据训练的直接3D扩散模型,导致生成多样性受限。本文提出一种新方法:基于预训练2D扩散模型微调的多视角2.5D扩散模型。该模型既能直接建模3D数据的结构分布,又保留原始2D扩散模型的强大泛化能力,填补了基于2D扩散与直接3D扩散方法在3D内容生成之间的空白。推理阶段,我们利用2.5D扩散生成多视角法线图,并通过新型可微分光栅化方案将近似一致的多视角法线图融合为统一3D模型。进一步设计法线条件多视角图像生成模块,在给定3D几何结构后快速生成外观。本方法采用单次扩散流程,无需任何SDS优化后处理。大量实验证明,我们提出的直接2.5D生成结合专用融合方案,仅需10秒即可实现多样化、无模式崩溃、高保真度的3D内容生成。项目主页:https://nju-3dv.github.io/projects/direct25。