This work explores the capacity of large language models (LLMs) to address problems at the intersection of spatial planning and natural language interfaces for navigation.Our focus is on following relatively complex instructions that are more akin to natural conversation than traditional explicit procedural directives seen in robotics. Unlike most prior work, where navigation directives are provided as imperative commands (e.g., go to the fridge), we examine implicit directives within conversational interactions. We leverage the 3D simulator AI2Thor to create complex and repeatable scenarios at scale, and augment it by adding complex language queries for 40 object types. We demonstrate that a robot can better parse descriptive language queries than existing methods by using an LLM to interpret the user interaction in the context of a list of the objects in the scene.
翻译:本研究探索了大语言模型(LLMs)在空间规划与自然语言导航接口交叉问题中的处理能力。我们重点关注如何遵循更接近自然对话的复杂指令,而非机器人领域传统明确的程序化指令。与大多数现有研究将导航指令作为命令式语句(例如"去冰箱那里")不同,我们研究了对话交互中的隐式指令。我们利用3D模拟器AI2Thor大规模构建复杂且可重复的场景,并通过为40种物体类型添加复杂语言查询来增强其功能。实验证明,通过使用大语言模型结合场景中物体列表来理解用户交互,机器人能够比现有方法更准确地解析描述性语言查询。