Lifting 2D diffusion for 3D generation is a challenging problem due to the lack of geometric prior and the complex entanglement of materials and lighting in natural images. Existing methods have shown promise by first creating the geometry through score-distillation sampling (SDS) applied to rendered surface normals, followed by appearance modeling. However, relying on a 2D RGB diffusion model to optimize surface normals is suboptimal due to the distribution discrepancy between natural images and normals maps, leading to instability in optimization. In this paper, recognizing that the normal and depth information effectively describe scene geometry and be automatically estimated from images, we propose to learn a generalizable Normal-Depth diffusion model for 3D generation. We achieve this by training on the large-scale LAION dataset together with the generalizable image-to-depth and normal prior models. In an attempt to alleviate the mixed illumination effects in the generated materials, we introduce an albedo diffusion model to impose data-driven constraints on the albedo component. Our experiments show that when integrated into existing text-to-3D pipelines, our models significantly enhance the detail richness, achieving state-of-the-art results. Our project page is https://lingtengqiu.github.io/RichDreamer/.
翻译:将二维扩散模型提升至三维生成是一项具有挑战性的问题,其原因在于缺乏几何先验以及自然图像中材质与光照的复杂纠缠。现有方法通过先对渲染表面法向应用分数蒸馏采样来创建几何结构,再建立外观模型,已展现出潜力。然而,由于自然图像与法向图之间存在分布差异,依赖二维RGB扩散模型优化表面法向并非最优策略,这会导致优化过程的不稳定性。本文中,鉴于法向与深度信息能够有效描述场景几何结构且可从图像中自动估计,我们提出学习一个通用的法向-深度扩散模型用于三维生成。我们通过在大规模LAION数据集上训练,并联合使用通用的图像到深度与法向先验模型来实现这一目标。为缓解生成材质中混合光照效应的影响,我们引入一个反照率扩散模型,对反照率分量施加数据驱动约束。实验表明,将我们的模型集成到现有文本到三维生成流程中后,可显著提升细节丰富度,达到当前最优水平。项目页面:https://lingtengqiu.github.io/RichDreamer/。