Automatically generating interactive 3D environments is crucial for scaling up robotic data collection in simulation. While prior work has primarily focused on 3D asset placement, it often overlooks the physical relationships between objects (e.g., contact, support, balance, and containment), which are essential for creating complex and realistic manipulation scenarios such as tabletop arrangements, shelf organization, or box packing. Compared to classical 3D layout generation, producing complex physical scenes introduces additional challenges: (a) higher object density and complexity (e.g., a small shelf may hold dozens of books), (b) richer supporting relationships and compact spatial layouts, and (c) the need to accurately model both spatial placement and physical properties. To address these challenges, we propose PhyScensis, an LLM agent-based framework powered by a physics engine, to produce physically plausible scene configurations with high complexity. Specifically, our framework consists of three main components: an LLM agent iteratively proposes assets with spatial and physical predicates; a solver, equipped with a physics engine, realizes these predicates into a 3D scene; and feedback from the solver informs the agent to refine and enrich the configuration. Moreover, our framework preserves strong controllability over fine-grained textual descriptions and numerical parameters (e.g., relative positions, scene stability), enabled through probabilistic programming for stability and a complementary heuristic that jointly regulates stability and spatial relations. Experimental results show that our method outperforms prior approaches in scene complexity, visual quality, and physical accuracy, offering a unified pipeline for generating complex physical scene layouts for robotic manipulation.
翻译:自动生成交互式3D环境对于扩展机器人仿真数据采集规模至关重要。现有研究主要集中于3D资产摆放,却常忽略物体间的物理关系(如接触、支撑、平衡与包容),而这些关系对于创建复杂逼真的操控场景(如桌面布置、货架整理或装箱作业)至关重要。相较于传统3D布局生成,构建复杂物理场景面临额外挑战:(a)更高的物体密度与复杂度(如小型书架需容纳数十本书籍),(b)更丰富的支撑关系与紧凑的空间布局,以及(c)需同时精确建模空间排布与物理属性。为应对这些挑战,我们提出PhyScensis——一个基于物理引擎驱动的LLM智能体框架,用于生成具有高复杂度的物理合理场景配置。该框架包含三个核心组件:LLM智能体通过迭代方式提出附带空间与物理谓词的资产;搭载物理引擎的求解器将这些谓词实例化为3D场景;求解器的反馈信息引导智能体优化并丰富场景配置。此外,通过稳定性概率编程与联合调控稳定性及空间关系的互补启发式算法,本框架实现了对细粒度文本描述与数值参数(如相对位置、场景稳定性)的强可控性。实验结果表明,该方法在场景复杂度、视觉质量与物理准确性方面均优于现有方案,为机器人操控任务提供了生成复杂物理场景布局的统一流程。