This paper presents a framework that can interpret humans' navigation commands containing temporal elements and directly translate their natural language instructions into robot motion planning. Central to our framework is utilizing Large Language Models (LLMs). To enhance the reliability of LLMs in the framework and improve user experience, we propose methods to resolve the ambiguity in natural language instructions and capture user preferences. The process begins with an ambiguity classifier, identifying potential uncertainties in the instructions. Ambiguous statements trigger a GPT-4-based mechanism that generates clarifying questions, incorporating user responses for disambiguation. Also, the framework assesses and records user preferences for non-ambiguous instructions, enhancing future interactions. The last part of this process is the translation of disambiguated instructions into a robot motion plan using Linear Temporal Logic. This paper details the development of this framework and the evaluation of its performance in various test scenarios.
翻译:本文提出一个框架,可解析包含时间元素的导航指令,并将自然语言指令直接转化为机器人运动规划。该框架的核心是借助大语言模型(LLMs)。为提升LLMs在此框架中的可靠性及用户体验,我们提出方法来消除自然语言指令中的歧义并捕获用户偏好。流程起始于一个歧义分类器,用于识别指令中的潜在不确定性。歧义语句会触发基于GPT-4的机制生成澄清问题,并结合用户应答以消除歧义。同时,框架会对无歧义的指令评估并记录用户偏好,从而优化后续交互。流程的最后阶段是利用线性时序逻辑将消歧后的指令转化为机器人运动规划。本文详细介绍了该框架的开发过程,并在多种测试场景中评估了其性能。