The ability to automatically generate large-scale, interactive, and physically realistic 3D environments is crucial for advancing robotic learning and embodied intelligence. However, existing generative approaches often fail to capture the functional complexity of real-world interiors, particularly those containing articulated objects with movable parts essential for manipulation and navigation. This paper presents SceneFoundry, a language-guided diffusion framework that generates apartment-scale 3D worlds with functionally articulated furniture and semantically diverse layouts for robotic training. From natural language prompts, an LLM module controls floor layout generation, while diffusion-based posterior sampling efficiently populates the scene with articulated assets from large-scale 3D repositories. To ensure physical usability, SceneFoundry employs differentiable guidance functions to regulate object quantity, prevent articulation collisions, and maintain sufficient walkable space for robotic navigation. Extensive experiments demonstrate that our framework generates structurally valid, semantically coherent, and functionally interactive environments across diverse scene types and conditions, enabling scalable embodied AI research. project page: https://anc891203.github.io/SceneFoundry-Demo/
翻译:自动生成大规模、交互式且物理逼真的三维环境的能力对于推动机器人学习与具身智能至关重要。然而,现有的生成方法往往无法捕捉真实室内环境的功能复杂性,尤其是那些包含对操作与导航至关重要的可动部件的铰接式物体。本文提出SceneFoundry,一个语言引导的扩散框架,可为机器人训练生成公寓规模、包含功能化铰接家具且语义布局多样的三维世界。基于自然语言提示,一个LLM模块控制楼层布局生成,而基于扩散的后验采样则高效地从大规模三维资源库中选取铰接资产来填充场景。为确保物理可用性,SceneFoundry采用可微分引导函数来调控物体数量、防止铰接碰撞,并为机器人导航维持足够的可通行空间。大量实验表明,我们的框架能够跨多种场景类型与条件,生成结构有效、语义连贯且功能可交互的环境,从而支持可扩展的具身人工智能研究。项目页面:https://anc891203.github.io/SceneFoundry-Demo/