Semantic reasoning and dynamic planning capabilities are crucial for an autonomous agent to perform complex navigation tasks in unknown environments. It requires a large amount of common-sense knowledge, that humans possess, to succeed in these tasks. We present SayNav, a new approach that leverages human knowledge from Large Language Models (LLMs) for efficient generalization to complex navigation tasks in unknown large-scale environments. SayNav uses a novel grounding mechanism, that incrementally builds a 3D scene graph of the explored environment as inputs to LLMs, for generating feasible and contextually appropriate high-level plans for navigation. The LLM-generated plan is then executed by a pre-trained low-level planner, that treats each planned step as a short-distance point-goal navigation sub-task. SayNav dynamically generates step-by-step instructions during navigation and continuously refines future steps based on newly perceived information. We evaluate SayNav on a new multi-object navigation task, that requires the agent to utilize a massive amount of human knowledge to efficiently search multiple different objects in an unknown environment. SayNav outperforms an oracle based Point-nav baseline, achieving a success rate of 95.35% (vs 56.06% for the baseline), under the ideal settings on this task, highlighting its ability to generate dynamic plans for successfully locating objects in large-scale new environments. In addition, SayNav also enables efficient generalization of learning to navigate from simulation to real novel environments.
翻译:语义推理与动态规划能力是自主智能体在未知环境中执行复杂导航任务的关键。这类任务需要大量人类所具备的常识知识才能成功完成。我们提出SayNav——一种利用大语言模型(LLMs)中人类知识的新方法,可高效泛化至未知大规模环境中的复杂导航任务。SayNav采用新颖的基础机制,通过渐进构建已探索环境的三维场景图作为LLMs输入,生成可行且上下文恰当的高级导航规划。由LLM生成的规划随后由预训练的低层规划器执行,将每个规划步骤视为短距离点目标导航子任务。在导航过程中,SayNav动态生成逐步指令,并根据新感知信息持续优化后续步骤。我们还在新型多目标导航任务上评估了SayNav——该任务要求智能体运用海量人类知识在未知环境中高效搜索多个不同物体。在理想设置下,SayNav在此任务上以95.35%的成功率(基线为56.06%)超越了基于oracle的点目标导航基线,凸显其在大型新环境中生成动态规划以成功定位物体的能力。此外,SayNav还能实现从仿真环境到真实新型环境的导航学习高效泛化。