Semantic reasoning and dynamic planning capabilities are crucial for an autonomous agent to perform complex navigation tasks in unknown environments. It requires a large amount of common-sense knowledge, that humans possess, to succeed in these tasks. We present SayNav, a new approach that leverages human knowledge from Large Language Models (LLMs) for efficient generalization to complex navigation tasks in unknown large-scale environments. SayNav uses a novel grounding mechanism, that incrementally builds a 3D scene graph of the explored environment as inputs to LLMs, for generating feasible and contextually appropriate high-level plans for navigation. The LLM-generated plan is then executed by a pre-trained low-level planner, that treats each planned step as a short-distance point-goal navigation sub-task. SayNav dynamically generates step-by-step instructions during navigation and continuously refines future steps based on newly perceived information. We evaluate SayNav on a new multi-object navigation task, that requires the agent to utilize a massive amount of human knowledge to efficiently search multiple different objects in an unknown environment. SayNav outperforms an oracle based Point-nav baseline, achieving a success rate of 95.35% (vs 56.06% for the baseline), under the ideal settings on this task, highlighting its ability to generate dynamic plans for successfully locating objects in large-scale new environments.
翻译:语义推理与动态规划能力对于自主智能体在未知环境中执行复杂导航任务至关重要。这需要人类所具备的大量常识性知识才能成功完成此类任务。我们提出SayNav——一种利用大语言模型(LLM)中人类知识的新方法,以实现对未知大规模环境中复杂导航任务的高效泛化。SayNav采用了一种新型的语义对齐机制,该机制将探索环境逐步构建为3D场景图并作为LLM的输入,从而生成符合上下文且可行的顶层导航规划。由LLM生成的规划随后由预训练的底层规划器执行,该规划器将每个规划步骤视为短距离目标点导航子任务。SayNav在导航过程中动态生成逐步指令,并基于新感知信息持续优化后续步骤。我们在一个新的多目标导航任务上评估了SayNav,该任务要求智能体利用海量人类知识在未知环境中高效搜索多个不同物体。在该任务的理想设定下,SayNav优于基于点导航的Oracle基线,成功率达到95.35%(基线为56.06%),突显了其在大规模新环境中生成动态规划以成功定位目标的能力。