Indoor scene generation is crucial for robot simulation and modern interior design. However, complex layouts together with scarce 3D scene data make learning-based generation challenging. Existing methods often rely on hand-crafted rules or focus on isolated sub-tasks (e.g., floorplan synthesis or single-room furnishing), producing whole-home scenes that lack global coherence, realism, and simulation readiness. To mitigate these limitations, we propose a unified hierarchical framework that decomposes indoor scene synthesis into controllable stages. First, we curate a large-scale dataset of 300K real residential floorplans to train a large language model for whole-home floorplan generation. With detailed descriptions and a K-D tree-based representation, our method enables fine-grained, controllable whole-home floorplan generation. Building upon the generated whole-home floorplan, we leverage image generation models to draft furniture layouts from multi-level roaming viewpoints, and then generate the layouts of small manipulable objects on different supporting surfaces (e.g., cabinets, desks, and dining tables) for embodied AI simulation. During furniture and object layout generation, a VLM-based refiner iteratively corrects furniture and object placement, and a 3D generative model enables flexible replacement of individual assets. We further attach basic physical attributes and simple surface texture and lighting setups to complete the pipeline for embodied AI use. Experiments and user studies demonstrate that our pipeline produces indoor spaces with greater layout diversity and stronger 3D design appeal, outperforming prior methods on both quantitative and qualitative metrics. Finally, alongside our generation pipeline, we will release the floorplan dataset and 5K fully furnished scenes to the community. Project Page: https://kairos-homeworld.github.io/
翻译:室内场景生成对于机器人仿真和现代室内设计至关重要。然而,复杂的布局与稀缺的三维场景数据使得基于学习的生成方法面临挑战。现有方法通常依赖手工规则或聚焦于孤立子任务(例如平面图合成或单房间家具布置),生成的全屋场景缺乏全局连贯性、真实感以及仿真就绪性。为缓解这些局限,我们提出一种统一的分层框架,将室内场景合成分解为可控阶段。首先,我们构建了一个包含30万真实住宅平面图的大规模数据集,用于训练一个面向全屋平面图生成的大语言模型。借助详细描述与基于K-D树的表示方法,我们的方法实现了细粒度、可控的全屋平面图生成。基于生成的全屋平面图,我们利用图像生成模型从多级漫游视角绘制家具布局,继而生成不同支撑表面(如橱柜、书桌和餐桌)上小件可操控物体的布局,以支持具身人工智能仿真。在家具与物体布局生成过程中,一个基于视觉语言模型的精炼器会迭代修正家具与物体的摆放,而一个三维生成模型则能灵活替换单个资产。我们进一步为管线附加基本物理属性、简单表面纹理和光照设置,以完善具身人工智能的应用流程。实验与用户研究表明,我们的管线能生成具有更高布局多样性和更强三维设计吸引力的室内空间,在定量与定性指标上均优于先前方法。最后,伴随生成管线,我们将向社区发布该平面图数据集以及5000个完整家具布置的场景。项目页面:https://kairos-homeworld.github.io/