The advancements in automatic text-to-3D generation have been remarkable. Most existing methods use pre-trained text-to-image diffusion models to optimize 3D representations like Neural Radiance Fields (NeRFs) via latent-space denoising score matching. Yet, these methods often result in artifacts and inconsistencies across different views due to their suboptimal optimization approaches and limited understanding of 3D geometry. Moreover, the inherent constraints of NeRFs in rendering crisp geometry and stable textures usually lead to a two-stage optimization to attain high-resolution details. This work proposes holistic sampling and smoothing approaches to achieve high-quality text-to-3D generation, all in a single-stage optimization. We compute denoising scores in the text-to-image diffusion model's latent and image spaces. Instead of randomly sampling timesteps (also referred to as noise levels in denoising score matching), we introduce a novel timestep annealing approach that progressively reduces the sampled timestep throughout optimization. To generate high-quality renderings in a single-stage optimization, we propose regularization for the variance of z-coordinates along NeRF rays. To address texture flickering issues in NeRFs, we introduce a kernel smoothing technique that refines importance sampling weights coarse-to-fine, ensuring accurate and thorough sampling in high-density regions. Extensive experiments demonstrate the superiority of our method over previous approaches, enabling the generation of highly detailed and view-consistent 3D assets through a single-stage training process.
翻译:自动文本到三维生成技术取得了显著进展。现有方法大多利用预训练的文本到图像扩散模型,通过潜在空间去噪分数匹配优化神经辐射场(NeRF)等三维表征。然而,由于优化方法欠优及对三维几何理解有限,这些方法常导致不同视角下的伪影和不一致。此外,NeRF在渲染清晰几何与稳定纹理方面的固有限制,通常需要两阶段优化才能获得高分辨率细节。本文提出整体采样与平滑方法,通过单阶段优化实现高质量文本到三维生成。我们在文本到图像扩散模型的潜在空间和图像空间中计算去噪分数。不同于随机采样时间步(即去噪分数匹配中的噪声水平),我们引入了一种新颖的时间步退火方法,在优化过程中逐步减小采样时间步。为实现单阶段优化中生成高质量渲染,我们提出了NeRF射线沿z坐标方差的正则化。针对NeRF中的纹理闪烁问题,引入核平滑技术,以由粗到精的方式细化重要性采样权重,确保高密度区域的精确充分采样。大量实验表明,我们的方法优于先前方法,能够通过单阶段训练生成高度精细且视角一致的三维资产。