With recent developments in Embodied Artificial Intelligence (EAI) research, there has been a growing demand for high-quality, large-scale interactive scene generation. While prior methods in scene synthesis have prioritized the naturalness and realism of the generated scenes, the physical plausibility and interactivity of scenes have been largely left unexplored. To address this disparity, we introduce PhyScene, a novel method dedicated to generating interactive 3D scenes characterized by realistic layouts, articulated objects, and rich physical interactivity tailored for embodied agents. Based on a conditional diffusion model for capturing scene layouts, we devise novel physics- and interactivity-based guidance mechanisms that integrate constraints from object collision, room layout, and object reachability. Through extensive experiments, we demonstrate that PhyScene effectively leverages these guidance functions for physically interactable scene synthesis, outperforming existing state-of-the-art scene synthesis methods by a large margin. Our findings suggest that the scenes generated by PhyScene hold considerable potential for facilitating diverse skill acquisition among agents within interactive environments, thereby catalyzing further advancements in embodied AI research. Project website: http://physcene.github.io.
翻译:随着具身人工智能(EAI)研究的最新进展,对高质量、大规模交互式场景生成的需求日益增长。尽管先前的场景合成方法优先考虑了生成场景的自然性与真实感,但场景的物理合理性与交互性在很大程度上尚未得到充分探索。为弥补这一差距,我们提出了PhyScene,这是一种新颖的方法,专门用于生成面向具身智能体、具有逼真布局、关节化物体及丰富物理交互性的可交互三维场景。基于一个用于捕捉场景布局的条件扩散模型,我们设计了新颖的基于物理与交互性的引导机制,该机制整合了物体碰撞、房间布局及物体可达性等多重约束。通过大量实验,我们证明PhyScene能有效利用这些引导函数进行物理可交互场景的合成,其性能显著优于现有的先进场景合成方法。我们的研究结果表明,PhyScene生成的场景在促进智能体于交互环境中习得多样化技能方面具有巨大潜力,从而将进一步推动具身AI研究的发展。项目网站:http://physcene.github.io。