Score Distillation Sampling (SDS) has emerged as a prevalent technique for text-to-3D generation, enabling 3D content creation by distilling view-dependent information from text-to-2D guidance. However, they frequently exhibit shortcomings such as over-saturated color and excess smoothness. In this paper, we conduct a thorough analysis of SDS and refine its formulation, finding that the core design is to model the distribution of rendered images. Following this insight, we introduce a novel strategy called Variational Distribution Mapping (VDM), which expedites the distribution modeling process by regarding the rendered images as instances of degradation from diffusion-based generation. This special design enables the efficient training of variational distribution by skipping the calculations of the Jacobians in the diffusion U-Net. We also introduce timestep-dependent Distribution Coefficient Annealing (DCA) to further improve distilling precision. Leveraging VDM and DCA, we use Gaussian Splatting as the 3D representation and build a text-to-3D generation framework. Extensive experiments and evaluations demonstrate the capability of VDM and DCA to generate high-fidelity and realistic assets with optimization efficiency.
翻译:分数蒸馏采样已成为文本到三维生成的主流技术,其通过从文本到二维引导中提取视角相关信息来实现三维内容创建。然而,该方法常存在颜色过饱和与过度平滑等缺陷。本文对分数蒸馏采样进行了系统分析并改进其公式表达,发现其核心设计在于对渲染图像分布的建模。基于这一洞见,我们提出一种称为变分分布映射的新策略,该策略将渲染图像视为基于扩散生成过程的退化实例,从而加速分布建模过程。这一特殊设计通过跳过扩散U-Net中雅可比矩阵的计算,实现了变分分布的高效训练。我们还引入了时间步相关的分布系数退火机制以进一步提升蒸馏精度。依托变分分布映射与分布系数退火机制,我们采用高斯泼溅作为三维表示方法,构建了文本到三维生成框架。大量实验与评估证明,变分分布映射与分布系数退火机制能够以优化的效率生成高保真且逼真的三维资产。