The advancements in automatic text-to-3D generation have been remarkable. Most existing methods use pre-trained text-to-image diffusion models to optimize 3D representations like Neural Radiance Fields (NeRFs) via latent-space denoising score matching. Yet, these methods often result in artifacts and inconsistencies across different views due to their suboptimal optimization approaches and limited understanding of 3D geometry. Moreover, the inherent constraints of NeRFs in rendering crisp geometry and stable textures usually lead to a two-stage optimization to attain high-resolution details. This work proposes holistic sampling and smoothing approaches to achieve high-quality text-to-3D generation, all in a single-stage optimization. We compute denoising scores in the text-to-image diffusion model's latent and image spaces. Instead of randomly sampling timesteps (also referred to as noise levels in denoising score matching), we introduce a novel timestep annealing approach that progressively reduces the sampled timestep throughout optimization. To generate high-quality renderings in a single-stage optimization, we propose regularization for the variance of z-coordinates along NeRF rays. To address texture flickering issues in NeRFs, we introduce a kernel smoothing technique that refines importance sampling weights coarse-to-fine, ensuring accurate and thorough sampling in high-density regions. Extensive experiments demonstrate the superiority of our method over previous approaches, enabling the generation of highly detailed and view-consistent 3D assets through a single-stage training process.
翻译:自动文本到3D生成技术取得了显著进展。现有方法大多利用预训练的文本到图像扩散模型,通过隐空间去噪分数匹配来优化神经辐射场(NeRF)等三维表示。然而,由于优化方法欠佳且对三维几何理解有限,这些方法常导致跨视图伪影和不一致性。此外,NeRF在渲染清晰几何与稳定纹理方面的固有约束,通常需要两阶段优化才能获得高分辨率细节。本文提出整体采样与平滑方法,通过单阶段优化实现高质量文本到3D生成。我们在文本到图像扩散模型的隐空间和图像空间中计算去噪分数,并非随机采样时间步长(即去噪分数匹配中的噪声水平),而是引入新颖的时序退火方法,在优化过程中逐步降低采样时间步长。为在单阶段优化中生成高质量渲染结果,我们提出对NeRF射线沿z轴坐标方差进行正则化。针对NeRF中的纹理闪烁问题,我们引入核平滑技术,从粗到细地细化重要性采样权重,确保在高密度区域实现精确充分的采样。大量实验表明,我们的方法优于已有方案,能够通过单阶段训练过程生成高度精细且视图一致的3D资产。