We present SceneFactor, a diffusion-based approach for large-scale 3D scene generation that enables controllable generation and effortless editing. SceneFactor enables text-guided 3D scene synthesis through our factored diffusion formulation, leveraging latent semantic and geometric manifolds for generation of arbitrary-sized 3D scenes. While text input enables easy, controllable generation, text guidance remains imprecise for intuitive, localized editing and manipulation of the generated 3D scenes. Our factored semantic diffusion generates a proxy semantic space composed of semantic 3D boxes that enables controllable editing of generated scenes by adding, removing, changing the size of the semantic 3D proxy boxes that guides high-fidelity, consistent 3D geometric editing. Extensive experiments demonstrate that our approach enables high-fidelity 3D scene synthesis with effective controllable editing through our factored diffusion approach.
翻译:本文提出SceneFactor,一种基于扩散模型的大规模三维场景生成方法,该方法支持可控生成与便捷编辑。SceneFactor通过我们提出的分解扩散框架,利用隐式语义与几何流形实现任意尺度的文本引导三维场景合成。虽然文本输入能够实现简便的可控生成,但文本引导在生成三维场景的直观局部编辑与操控方面仍存在精度不足的问题。我们提出的分解语义扩散方法生成一个由语义三维边界框构成的代理语义空间,通过添加、移除或调整引导高保真度、一致性三维几何编辑的语义三维代理框,实现对生成场景的可控编辑。大量实验表明,该方法通过分解扩散框架能够实现高保真度的三维场景合成,并具备高效的可控编辑能力。