We introduce RealmDreamer, a technique for generating forward-facing 3D scenes from text descriptions. Our method optimizes a 3D Gaussian Splatting representation to match complex text prompts using pretrained diffusion models. Our key insight is to leverage 2D inpainting diffusion models conditioned on an initial scene estimate to provide low variance supervision for unknown regions during 3D distillation. In conjunction, we imbue high-fidelity geometry with geometric distillation from a depth diffusion model, conditioned on samples from the inpainting model. We find that the initialization of the optimization is crucial, and provide a principled methodology for doing so. Notably, our technique doesn't require video or multi-view data and can synthesize various high-quality 3D scenes in different styles with complex layouts. Further, the generality of our method allows 3D synthesis from a single image. As measured by a comprehensive user study, our method outperforms all existing approaches, preferred by 88-95%. Project Page: https://realmdreamer.github.io/
翻译:本文提出RealmDreamer技术,用于从文本描述生成前向三维场景。该方法通过预训练扩散模型优化三维高斯泼溅表示,以匹配复杂文本提示。我们的核心洞见在于:利用基于初始场景估计的二维修复扩散模型,为三维蒸馏过程中的未知区域提供低方差监督。同时,我们通过深度扩散模型的几何蒸馏注入高保真几何信息,该深度模型以修复模型的采样结果为条件。研究发现优化初始化至关重要,并为此提出了系统化方法论。值得注意的是,该技术无需视频或多视角数据,即可合成具有复杂布局、多种风格的高质量三维场景。此外,本方法的通用性支持从单张图像进行三维合成。根据综合用户研究评估,本方法优于所有现有方案,获得88-95%的用户偏好。项目页面:https://realmdreamer.github.io/