Recently, the text-to-3D task has developed rapidly due to the appearance of the SDS method. However, the SDS method always generates 3D objects with poor quality due to the over-smooth issue. This issue is attributed to two factors: 1) the DDPM single-step inference produces poor guidance gradients; 2) the randomness from the input noises and timesteps averages the details of the 3D contents. In this paper, to address the issue, we propose DreamLCM which incorporates the Latent Consistency Model (LCM). DreamLCM leverages the powerful image generation capabilities inherent in LCM, enabling generating consistent and high-quality guidance, i.e., predicted noises or images. Powered by the improved guidance, the proposed method can provide accurate and detailed gradients to optimize the target 3D models. In addition, we propose two strategies to enhance the generation quality further. Firstly, we propose a guidance calibration strategy, utilizing Euler Solver to calibrate the guidance distribution to accelerate 3D models to converge. Secondly, we propose a dual timestep strategy, increasing the consistency of guidance and optimizing 3D models from geometry to appearance in DreamLCM. Experiments show that DreamLCM achieves state-of-the-art results in both generation quality and training efficiency. The code is available at https://github.com/1YimingZhong/DreamLCM.
翻译:近年来,得益于SDS方法的出现,文本到三维任务发展迅速。然而,由于过度平滑问题,SDS方法生成的3D物体质量往往较差。该问题归因于两个因素:1)DDPM单步推理产生的引导梯度质量不佳;2)输入噪声和时间步的随机性平均化了3D内容的细节。本文为解决该问题,提出了融合潜在一致性模型(LCM)的DreamLCM。DreamLCM利用LCM固有的强大图像生成能力,能够生成一致且高质量的引导信号(即预测噪声或图像)。借助改进的引导信号,所提方法能够提供准确且详细的梯度来优化目标3D模型。此外,我们提出了两种策略以进一步提升生成质量。首先,我们提出引导校准策略,利用欧拉求解器校准引导分布以加速3D模型收敛。其次,我们提出双时间步策略,在DreamLCM中增强引导信号的一致性,并从几何到外观逐步优化3D模型。实验表明,DreamLCM在生成质量和训练效率方面均达到了最先进水平。代码发布于 https://github.com/1YimingZhong/DreamLCM。