We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks. Despite the critical importance of these tasks, existing methodologies often struggle to generate high-caliber results. We begin by examining the inherent limitations in previous diffusion priors. We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation. To address this issue, we propose a novel, unified framework that iteratively optimizes both the 3D model and the diffusion prior. Leveraging the different learnable parameters of the diffusion prior, our approach offers multiple configurations, affording various trade-offs between performance and implementation complexity. Notably, our experimental results demonstrate that our method markedly surpasses existing techniques, establishing new state-of-the-art in the realm of text-to-3D generation. Furthermore, our approach exhibits impressive performance on both NeRF and the newly introduced 3D Gaussian Splatting backbones. Additionally, our framework yields insightful contributions to the understanding of recent score distillation methods, such as the VSD and DDS loss.
翻译:我们提出一种旨在增强3D生成任务扩散先验的统一框架。尽管这些任务至关重要,现有方法仍难以生成高质量结果。我们首先剖析了先前扩散先验的本质局限,发现扩散先验与扩散模型训练流程之间的偏差严重损害了3D生成质量。为解决该问题,我们提出一种新颖的统一框架,通过迭代优化3D模型与扩散先验。基于扩散先验的不同可学习参数,我们的方法提供多种配置,在性能与实现复杂度间实现灵活权衡。值得注意的是,实验结果表明,本方法显著超越现有技术,在文本到3D生成领域创下新纪录。此外,本方法在NeRF与新型3D高斯泼溅骨干网络上均展现出卓越性能。同时,该框架对VSD和DDS损失等最新分数蒸馏方法提供了深刻的见解。