STABLE: Simulation-Ready Tabletop Layout Generation via a Semantics-Physics Dual System

Generating simulation-ready tabletop scenes from task instructions is an intriguing and promising research direction in the field of Embodied AI. However, existing task-to-scene generation methods rely exclusively on large language models (LLMs) to predict scene layouts, inevitably yielding object collisions or floating due to LLMs' inherent limitations in 3D spatial reasoning. In this paper, we present STABLE, a semantics-physics dual-system tailored for simulation-ready tabletop scene generation. STABLE consists of two complementary modules: (i) a Semantic Reasoner, a fine-tuned LLM trained on a structured tabletop scene dataset to generate coarse layouts from input task instructions, and (ii) a Physics Corrector, a physics-aware flow-based denoising model that outputs pose updates to refine layouts, which ensures the physical plausibility of scenes while preserves semantic alignment with task instructions. STABLE adopts a progressive generation paradigm: by alternating between the Semantic Reasoner and Physics Corrector, it incrementally expands the scene from task-critical objects to background objects. Experiments demonstrate that STABLE successfully generates simulation-ready tabletop scenes that strictly conform to task instructions and significantly enhances the physical validity of scenes over prior art.

翻译：从任务指令生成仿真就绪的桌面场景是具身智能领域中一个引人入胜且前景广阔的研究方向。然而，现有的任务到场景生成方法完全依赖大型语言模型来预测场景布局，由于大型语言模型在三维空间推理方面的固有局限性，不可避免地导致物体碰撞或悬浮。本文提出了STABLE，这是一种专为仿真就绪桌面场景生成量身定制的语义-物理双系统。STABLE由两个互补模块组成：(i) 语义推理器，一个在结构化桌面场景数据集上微调的大型语言模型，用于从输入任务指令中生成粗略布局；(ii) 物理校正器，一个基于物理感知流的去噪模型，输出位姿更新以优化布局，该模型在确保场景物理合理性的同时，保持与任务指令的语义对齐。STABLE采用渐进式生成范式：通过在语义推理器和物理校正器之间交替迭代，从任务关键物体逐步扩展到背景物体。实验表明，STABLE成功生成了严格符合任务指令的仿真就绪桌面场景，并且在场景物理有效性方面显著超越了现有技术。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

从静态模板到动态运行时图：大语言模型智能体（LLM Agents）工作流优化综述

专知会员服务

23+阅读 · 3月30日

《大语言模型辅助生成军事训练场景》

专知会员服务

43+阅读 · 2025年11月13日

【NeurIPS2024】《AmoebaLLM：构建任意形状的大型语言模型以实现高效和即时部署》

专知会员服务

22+阅读 · 2024年11月21日

大型语言模型对齐技术综述：RLHF、RLAIF、PPO、DPO 等

专知会员服务

55+阅读 · 2024年7月24日