This work explores the capacity of large language models (LLMs) to address problems at the intersection of spatial planning and natural language interfaces for navigation. We focus on following complex instructions that are more akin to natural conversation than traditional explicit procedural directives typically seen in robotics. Unlike most prior work where navigation directives are provided as simple imperative commands (e.g., "go to the fridge"), we examine implicit directives obtained through conversational interactions.We leverage the 3D simulator AI2Thor to create household query scenarios at scale, and augment it by adding complex language queries for 40 object types. We demonstrate that a robot using our method CARTIER (Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots) can parse descriptive language queries up to 42% more reliably than existing LLM-enabled methods by exploiting the ability of LLMs to interpret the user interaction in the context of the objects in the scenario.
翻译:本文探究了大语言模型(LLMs)在空间规划与自然语言导航接口交叉领域处理问题的能力。我们聚焦于理解复杂指令,这类指令更类似于自然对话,而非机器人领域中传统的显式过程性指令。与大多数先前工作将导航指令设定为简单祈使命令(例如“去冰箱那里”)不同,我们研究通过对话交互获得的隐含指令。我们利用3D模拟器AI2Thor大规模构建家庭查询场景,并通过为40种物体类型添加复杂语言查询来增强该模拟器。研究表明,采用我们提出的CARTIER方法(面向机器人指令执行的制图语言推理)的机器人,能够利用LLMs在场景对象语境中解读用户交互的能力,比现有基于LLM的方法在解析描述性语言查询方面可靠性提升高达42%。