Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

Score distillation sampling (SDS) and its variants have greatly boosted the development of text-to-3D generation, but are vulnerable to geometry collapse and poor textures yet. To solve this issue, we first deeply analyze the SDS and find that its distillation sampling process indeed corresponds to the trajectory sampling of a stochastic differential equation (SDE): SDS samples along an SDE trajectory to yield a less noisy sample which then serves as a guidance to optimize a 3D model. However, the randomness in SDE sampling often leads to a diverse and unpredictable sample which is not always less noisy, and thus is not a consistently correct guidance, explaining the vulnerability of SDS. Since for any SDE, there always exists an ordinary differential equation (ODE) whose trajectory sampling can deterministically and consistently converge to the desired target point as the SDE, we propose a novel and effective "Consistent3D" method that explores the ODE deterministic sampling prior for text-to-3D generation. Specifically, at each training iteration, given a rendered image by a 3D model, we first estimate its desired 3D score function by a pre-trained 2D diffusion model, and build an ODE for trajectory sampling. Next, we design a consistency distillation sampling loss which samples along the ODE trajectory to generate two adjacent samples and uses the less noisy sample to guide another more noisy one for distilling the deterministic prior into the 3D model. Experimental results show the efficacy of our Consistent3D in generating high-fidelity and diverse 3D objects and large-scale scenes, as shown in Fig. 1. The codes are available at https://github.com/sail-sg/Consistent3D.

翻译：分数蒸馏采样（SDS）及其变体极大地推动了文本到3D生成的发展，但易受几何坍缩和纹理质量低下的影响。为解决此问题，我们首先深入分析SDS，发现其蒸馏采样过程本质上对应随机微分方程（SDE）的轨迹采样：SDE将轨迹采样用来生成低噪样本，作为指导优化3D模型的先验。然而，SDE采样中的随机性常导致多样且不可预测的样本，这些样本并非总是低噪的，因此无法提供一致正确的指导，这解释了SDS的脆弱性。由于对任意SDE，总存在一个常微分方程（ODE），其轨迹采样能确定性地且一致地收敛到与SDE相同的目标点，我们提出一种新颖且有效的"Consistent3D"方法，利用ODE确定性采样先验进行文本到3D生成。具体而言，在每个训练迭代中，给定3D模型渲染的图像，我们首先通过预训练的2D扩散模型估计其期望的3D分数函数，并构建用于轨迹采样的ODE。接着，我们设计一种一致性蒸馏采样损失，沿ODE轨迹采样生成两个相邻样本，并利用低噪样本引导另一较高噪样本，从而将确定性先验蒸馏到3D模型中。实验结果表明，Consistent3D在生成高保真、多样化的3D物体及大规模场景方面具有显著效果（图1所示）。代码已开源至 https://github.com/sail-sg/Consistent3D。