STABLE: Simulation-Ready Tabletop Layout Generation via a Semantics-Physics Dual System

Generating simulation-ready tabletop scenes from task instructions is an intriguing and promising research direction in the field of Embodied AI. However, existing task-to-scene generation methods rely exclusively on large language models (LLMs) to predict scene layouts, inevitably yielding object collisions or floating due to LLMs' inherent limitations in 3D spatial reasoning. In this paper, we present STABLE, a semantics-physics dual-system tailored for simulation-ready tabletop scene generation. STABLE consists of two complementary modules: (i) a Semantic Reasoner, a fine-tuned LLM trained on a structured tabletop scene dataset to generate coarse layouts from input task instructions, and (ii) a Physics Corrector, a physics-aware flow-based denoising model that outputs pose updates to refine layouts, which ensures the physical plausibility of scenes while preserves semantic alignment with task instructions. STABLE adopts a progressive generation paradigm: by alternating between the Semantic Reasoner and Physics Corrector, it incrementally expands the scene from task-critical objects to background objects. Experiments demonstrate that STABLE successfully generates simulation-ready tabletop scenes that strictly conform to task instructions and significantly enhances the physical validity of scenes over prior art.

翻译：从任务指令生成仿真就绪的桌面场景是具身智能领域中一个引人入胜且前景广阔的研究方向。然而，现有的任务到场景生成方法仅依赖大型语言模型来预测场景布局，由于LLMs在三维空间推理方面固有的局限性，不可避免地会产生物体碰撞或悬浮现象。本文提出STABLE，一种专为仿真就绪桌面场景生成定制的语义-物理双系统。STABLE由两个互补模块组成：(i) 语义推理器，一个基于结构化桌面场景数据集微调的LLM，用于从输入任务指令生成粗略布局；(ii) 物理校正器，一个物理感知的基于流形的去噪模型，输出位姿更新以优化布局，在确保场景物理合理性的同时保持与任务指令的语义对齐。STABLE采用渐进式生成范式：通过交替运行语义推理器和物理校正器，从任务关键物体逐步扩展至背景物体。实验表明，STABLE成功生成了严格符合任务指令的仿真就绪桌面场景，并在场景物理有效性方面显著超越现有技术。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

大语言模型智能体（LLM Agents）工具调用的演进：从单工具调用到多工具协同编排

专知会员服务

29+阅读 · 4月6日

从静态模板到动态运行时图：大语言模型智能体（LLM Agents）工作流优化综述

专知会员服务

23+阅读 · 3月30日

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

专知会员服务

24+阅读 · 2025年10月29日

LLMs与生成式智能体模拟：复杂系统研究的新范式

专知会员服务

28+阅读 · 2025年6月15日