Large language models (LLMs) have demonstrated impressive results in developing generalist planning agents for diverse tasks. However, grounding these plans in expansive, multi-floor, and multi-room environments presents a significant challenge for robotics. We introduce SayPlan, a scalable approach to LLM-based, large-scale task planning for robotics using 3D scene graph (3DSG) representations. To ensure the scalability of our approach, we: (1) exploit the hierarchical nature of 3DSGs to allow LLMs to conduct a 'semantic search' for task-relevant subgraphs from a smaller, collapsed representation of the full graph; (2) reduce the planning horizon for the LLM by integrating a classical path planner and (3) introduce an 'iterative replanning' pipeline that refines the initial plan using feedback from a scene graph simulator, correcting infeasible actions and avoiding planning failures. We evaluate our approach on two large-scale environments spanning up to 3 floors and 36 rooms with 140 assets and objects and show that our approach is capable of grounding large-scale, long-horizon task plans from abstract, and natural language instruction for a mobile manipulator robot to execute. We provide real robot video demonstrations on our project page https://sayplan.github.io.
翻译:大语言模型(LLMs)在开发面向多样化任务的通用规划智能体方面展现了令人瞩目的成果。然而,将这些规划锚定在具有多层、多房间的广阔环境中,对机器人技术构成了重大挑战。我们提出SayPlan——一种基于LLM的、利用3D场景图(3DSG)表示实现可扩展大规模机器人任务规划的方法。为确保方法的可扩展性,我们:(1)利用3DSG的层次结构特性,使LLM能够从全图的简化压缩表示中,对任务相关的子图进行"语义搜索";(2)通过集成经典路径规划器来缩短LLM的规划视界;以及(3)引入"迭代重规划"流程,利用场景图模拟器的反馈优化初始规划,修正不可行动作并避免规划失败。我们在两个跨越3层楼、36个房间、包含140个资产与物体的大规模环境中评估了该方法,结果表明:该方法能够从抽象的、自然语言指令中,为移动操控机器人锚定大规模、长时域的任务规划方案并执行。我们在项目页面https://sayplan.github.io提供了真实机器人视频演示。