Score Distillation Sampling (SDS) has emerged as a prevalent technique for text-to-3D generation, enabling 3D content creation by distilling view-dependent information from text-to-2D guidance. However, they frequently exhibit shortcomings such as over-saturated color and excess smoothness. In this paper, we conduct a thorough analysis of SDS and refine its formulation, finding that the core design is to model the distribution of rendered images. Following this insight, we introduce a novel strategy called Variational Distribution Mapping (VDM), which expedites the distribution modeling process by regarding the rendered images as instances of degradation from diffusion-based generation. This special design enables the efficient training of variational distribution by skipping the calculations of the Jacobians in the diffusion U-Net. We also introduce timestep-dependent Distribution Coefficient Annealing (DCA) to further improve distilling precision. Leveraging VDM and DCA, we use Gaussian Splatting as the 3D representation and build a text-to-3D generation framework. Extensive experiments and evaluations demonstrate the capability of VDM and DCA to generate high-fidelity and realistic assets with optimization efficiency.
翻译:分数蒸馏采样(SDS)已成为文本到3D生成的主流技术,其通过从文本到2D引导中蒸馏视图相关信息来实现3D内容创建。然而,该方法常表现出色彩过饱和与过度平滑等缺陷。本文对SDS进行了深入分析并改进其公式化表达,发现其核心设计在于对渲染图像分布进行建模。基于此洞见,我们提出了一种称为变分分布映射(VDM)的新策略,该策略将渲染图像视为基于扩散的生成过程的退化实例,从而加速分布建模过程。此特殊设计通过跳过扩散U-Net中雅可比矩阵的计算,实现了变分分布的高效训练。我们还引入了时间步相关的分布系数退火(DCA)以进一步提升蒸馏精度。利用VDM与DCA,我们采用高斯泼溅作为3D表示方法,构建了一个文本到3D生成框架。大量实验与评估证明,VDM与DCA能够以优化的效率生成高保真度且逼真的数字资产。