We introduce LaviGen, a framework that repurposes 3D generative models for 3D layout generation. Unlike previous methods that infer object layouts from textual descriptions, LaviGen operates directly in the native 3D space, formulating layout generation as an autoregressive process that explicitly models geometric relations and physical constraints among objects, producing coherent and physically plausible 3D scenes. To further enhance this process, we propose an adapted 3D diffusion model that integrates scene, object, and instruction information and employs a dual-guidance self-rollout distillation mechanism to improve efficiency and spatial accuracy. Extensive experiments on the LayoutVLM benchmark show LaviGen achieves superior 3D layout generation performance, with 19% higher physical plausibility than the state of the art and 65% faster computation. Our code is publicly available at https://github.com/fenghora/LaviGen.
翻译:我们提出LaviGen框架,该框架将三维生成模型重利用于三维布局生成。与以往从文本描述推断物体布局的方法不同,LaviGen直接在原生三维空间中运作,将布局生成建模为显式建模物体间几何关系与物理约束的自回归过程,从而生成连贯且物理可信的三维场景。为进一步增强这一过程,我们提出一种适配的三维扩散模型,该模型整合场景、物体与指令信息,并采用双引导自扩散蒸馏机制以提升效率与空间准确性。在LayoutVLM基准上的大量实验表明,LaviGen实现了优异的三维布局生成性能,物理合理性较现有最优方法提升19%,计算速度提升65%。我们的代码已在https://github.com/fenghora/LaviGen开源。