Large language models (LLMs) have demonstrated impressive results in developing generalist planning agents for diverse tasks. However, grounding these plans in expansive, multi-floor, and multi-room environments presents a significant challenge for robotics. We introduce SayPlan, a scalable approach to LLM-based, large-scale task planning for robotics using 3D scene graph (3DSG) representations. To ensure the scalability of our approach, we: (1) exploit the hierarchical nature of 3DSGs to allow LLMs to conduct a semantic search for task-relevant subgraphs from a smaller, collapsed representation of the full graph; (2) reduce the planning horizon for the LLM by integrating a classical path planner and (3) introduce an iterative replanning pipeline that refines the initial plan using feedback from a scene graph simulator, correcting infeasible actions and avoiding planning failures. We evaluate our approach on two large-scale environments spanning up to 3 floors, 36 rooms and 140 objects, and show that our approach is capable of grounding large-scale, long-horizon task plans from abstract, and natural language instruction for a mobile manipulator robot to execute.
翻译:大语言模型(LLMs)在开发面向多样化任务的通用规划智能体方面展现出令人瞩目的成果。然而,将这些规划落地于包含多楼层、多房间的广阔复杂环境,对机器人领域构成了重大挑战。我们提出SayPlan——一种基于LLM的、可扩展的机器人大规模任务规划方法,该方法采用3D场景图(3DSG)表征。为确保方法的可扩展性,我们:(1)利用3DSG的层级结构,使LLM能够从完整图的压缩表征中对任务相关子图进行语义搜索;(2)通过集成经典路径规划器缩短LLM的规划视野;(3)引入迭代重规划流水线,利用场景图模拟器的反馈优化初始规划,修正不可行动作并避免规划失败。我们在两个覆盖多达3层、36个房间及140个物体的大规模环境中评估了该方法,结果表明该方法能够为移动操作机器人执行的任务,从抽象的自然语言指令中,将大规模、长时域的任务规划落实到具体场景中。