We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout and introduce a layout-guided 3D Gaussian representation for 3D content generation with adaptive geometric constraints. We then propose an object-scene compositional optimization mechanism with conditioned diffusion to collaboratively generate realistic 3D scenes with consistent geometry, texture, scale, and accurate interactions among multiple objects while simultaneously adjusting the coarse layout priors extracted from the LLMs to align with the generated scene. Experiments show that GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing while ensuring the high fidelity of object-level entities within the scene. Source codes and models will be available at https://gala3d.github.io/.
翻译:我们提出GALA3D——一种具有布局引导控制的生成式三维高斯表示方法,用于实现高效的组合式文本到三维生成。首先利用大语言模型生成初始布局,并引入一种带自适应几何约束的布局引导三维高斯表示进行三维内容生成。进而提出一种基于条件扩散的物体-场景组合优化机制,通过协同生成具有一致几何结构、纹理、尺度以及多物体间精确交互关系的逼真三维场景,同时动态调整从大语言模型中提取的粗粒度布局先验,使其与生成场景对齐。实验表明,GALA3D是一种用户友好的端到端框架,能够实现场景级三维内容生成与可控编辑的最新性能,同时确保场景内物体级实体具有高保真度。源代码与模型将发布于https://gala3d.github.io/。