Humans have the remarkable ability to navigate through unfamiliar environments by solely relying on our prior knowledge and descriptions of the environment. For robots to perform the same type of navigation, they need to be able to associate natural language descriptions with their associated physical environment with a limited amount of prior knowledge. Recently, Large Language Models (LLMs) have been able to reason over billions of parameters and utilize them in multi-modal chat-based natural language responses. However, LLMs lack real-world awareness and their outputs are not always predictable. In this work, we develop NavCom, a low-bandwidth framework that solves this lack of real-world generalization by creating an intermediate layer between an LLM and a robot navigation framework in the form of Python code. Our intermediate shoehorns the vast prior knowledge inherent in an LLM model into a series of input and output API instructions that a mobile robot can understand. We evaluate our method across four different environments and command classes on a mobile robot and highlight our NavCom's ability to interpret contextual commands.
翻译:人类具有仅依靠先验知识和环境描述就能在陌生环境中导航的卓越能力。要让机器人执行同样的导航任务,它们需要能够在有限先验知识条件下,将自然语言描述与其对应的物理环境关联起来。近年来,大型语言模型(LLMs)已能够对数亿个参数进行推理,并将其应用于多模态基于聊天的自然语言响应中。然而,LLMs缺乏对现实世界的认知,其输出结果并不总是可预测的。在本工作中,我们开发了NavCom,这是一个低带宽框架,通过在大语言模型与机器人导航框架之间创建一个以Python代码形式的中间层,解决了这种现实世界泛化能力不足的问题。我们的中间层将LLM模型固有的海量先验知识巧妙转化为移动机器人能够理解的一系列输入和输出API指令。我们在四种不同环境和指令类别下对我们的方法进行了评估,凸显了NavCom解释上下文命令的能力。