DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data

We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets (represented by Neural Radiance Fields) from text prompts. Unlike recent 3D generative models that rely on clean and well-aligned 3D data, limiting them to single or few-class generation, our model is directly trained on extensive noisy and unaligned `in-the-wild' 3D assets, mitigating the key challenge (i.e., data scarcity) in large-scale 3D generation. In particular, DIRECT-3D is a tri-plane diffusion model that integrates two innovations: 1) A novel learning framework where noisy data are filtered and aligned automatically during the training process. Specifically, after an initial warm-up phase using a small set of clean data, an iterative optimization is introduced in the diffusion process to explicitly estimate the 3D pose of objects and select beneficial data based on conditional density. 2) An efficient 3D representation that is achieved by disentangling object geometry and color features with two separate conditional diffusion models that are optimized hierarchically. Given a prompt input, our model generates high-quality, high-resolution, realistic, and complex 3D objects with accurate geometric details in seconds. We achieve state-of-the-art performance in both single-class generation and text-to-3D generation. We also demonstrate that DIRECT-3D can serve as a useful 3D geometric prior of objects, for example to alleviate the well-known Janus problem in 2D-lifting methods such as DreamFusion. The code and models are available for research purposes at: https://github.com/qihao067/direct3d.

翻译：我们提出了DIRECT-3D，一种基于扩散的三维生成模型，用于根据文本提示创建高质量的三维资产（以神经辐射场表示）。与近期依赖干净且对齐良好的三维数据、从而局限于单一或少数类别生成的三维生成模型不同，我们的模型直接在大量噪声且未对齐的“野外”三维资产上进行训练，从而缓解了大规模三维生成中的关键挑战（即数据稀缺问题）。具体而言，DIRECT-3D是一个三平面扩散模型，它整合了两项创新：1）一种新颖的学习框架，在该框架中，噪声数据在训练过程中被自动过滤和对齐。具体来说，在使用少量干净数据进行初始预热阶段后，我们在扩散过程中引入了迭代优化，以显式估计物体的三维姿态，并基于条件密度选择有益的数据。2）一种高效的三维表示，这是通过将物体几何特征与颜色特征解耦，并使用两个分别进行分层优化的条件扩散模型来实现的。给定一个提示输入，我们的模型能在数秒内生成具有精确几何细节的高质量、高分辨率、真实且复杂的三维物体。我们在单类别生成和文本到三维生成任务中均实现了最先进的性能。我们还证明了DIRECT-3D可以作为一种有用的物体三维几何先验，例如，用于缓解像DreamFusion这样的二维提升方法中众所周知的Janus问题。代码和模型已为研究目的在以下网址提供：https://github.com/qihao067/direct3d。