This work explores the capacity of large language models (LLMs) to address problems at the intersection of spatial planning and natural language interfaces for navigation.Our focus is on following relatively complex instructions that are more akin to natural conversation than traditional explicit procedural directives seen in robotics. Unlike most prior work, where navigation directives are provided as imperative commands (e.g., go to the fridge), we examine implicit directives within conversational interactions. We leverage the 3D simulator AI2Thor to create complex and repeatable scenarios at scale, and augment it by adding complex language queries for 40 object types. We demonstrate that a robot can better parse descriptive language queries than existing methods by using an LLM to interpret the user interaction in the context of a list of the objects in the scene.
翻译:本研究探索了大语言模型(LLMs)在空间规划与自然语言导航接口交叉领域解决问题的能力。我们聚焦于遵循相对复杂的指令——这些指令更接近于自然对话,而非机器人领域传统的显式程序化指令。与大多数先前工作不同(其中导航指令以命令式语句呈现,如“去冰箱那里”),我们研究对话交互中的隐含指令。我们利用3D模拟器AI2Thor创建复杂且可重复的大规模场景,并通过对40种物体类型添加复杂语言查询来增强该模拟器。研究表明,通过借助大语言模型在场景物体列表语境中解析用户交互,机器人解析描述性语言查询的能力优于现有方法。