VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation

Text-to-3D generation aims to create 3D assets from text-to-image diffusion models. However, existing methods face an inherent bottleneck in generation quality because the widely-used objectives such as Score Distillation Sampling (SDS) inappropriately omit U-Net jacobians for swift generation, leading to significant bias compared to the "true" gradient obtained by full denoising sampling. This bias brings inconsistent updating direction, resulting in implausible 3D generation e.g., color deviation, Janus problem, and semantically inconsistent details). In this work, we propose Pose-dependent Consistency Distillation Sampling (PCDS), a novel yet efficient objective for diffusion-based 3D generation tasks. Specifically, PCDS builds the pose-dependent consistency function within diffusion trajectories, allowing to approximate true gradients through minimal sampling steps (1-3). Compared to SDS, PCDS can acquire a more accurate updating direction with the same sampling time (1 sampling step), while enabling few-step (2-3) sampling to trade compute for higher generation quality. For efficient generation, we propose a coarse-to-fine optimization strategy, which first utilizes 1-step PCDS to create the basic structure of 3D objects, and then gradually increases PCDS steps to generate fine-grained details. Extensive experiments demonstrate that our approach outperforms the state-of-the-art in generation quality and training efficiency, conspicuously alleviating the implausible 3D generation issues caused by the deviated updating direction. Moreover, it can be simply applied to many 3D generative applications to yield impressive 3D assets, please see our project page: https://narcissusex.github.io/VividDreamer.

翻译：文本到三维生成旨在利用文本到图像扩散模型创建三维资产。然而，现有方法在生成质量上面临固有瓶颈，因为广泛使用的目标函数（如分数蒸馏采样）为追求快速生成而不恰当地忽略了U-Net雅可比矩阵，导致与通过完整去噪采样获得的“真实”梯度存在显著偏差。这种偏差会引发更新方向不一致，从而产生不合理的三维生成结果（例如色彩偏差、双面问题以及语义不一致的细节）。本工作中，我们提出姿态相关一致性蒸馏采样，这是一种新颖且高效的、面向基于扩散的三维生成任务的目标函数。具体而言，PCDS在扩散轨迹内构建姿态相关的一致性函数，使其能够通过最少的采样步骤（1-3步）来近似真实梯度。与SDS相比，PCDS能够在相同采样时间（1个采样步骤）内获得更准确的更新方向，同时支持通过少量步骤（2-3步）采样以计算资源换取更高的生成质量。为实现高效生成，我们提出了一种由粗到细的优化策略：首先利用单步PCDS构建三维物体的基础结构，然后逐步增加PCDS步数以生成精细细节。大量实验表明，我们的方法在生成质量和训练效率上均优于现有先进技术，显著缓解了因更新方向偏差导致的不合理三维生成问题。此外，该方法可简便地应用于多种三维生成应用以产出令人印象深刻的三维资产，详见项目页面：https://narcissusex.github.io/VividDreamer。