In this paper, we focus on inferring whether the given user command is clear, ambiguous, or infeasible in the context of interactive robotic agents utilizing large language models (LLMs). To tackle this problem, we first present an uncertainty estimation method for LLMs to classify whether the command is certain (i.e., clear) or not (i.e., ambiguous or infeasible). Once the command is classified as uncertain, we further distinguish it between ambiguous or infeasible commands leveraging LLMs with situational aware context in a zero-shot manner. For ambiguous commands, we disambiguate the command by interacting with users via question generation with LLMs. We believe that proper recognition of the given commands could lead to a decrease in malfunction and undesired actions of the robot, enhancing the reliability of interactive robot agents. We present a dataset for robotic situational awareness, consisting pair of high-level commands, scene descriptions, and labels of command type (i.e., clear, ambiguous, or infeasible). We validate the proposed method on the collected dataset, pick-and-place tabletop simulation. Finally, we demonstrate the proposed approach in real-world human-robot interaction experiments, i.e., handover scenarios.
翻译:本文聚焦于在大语言模型驱动的交互式机器人代理场景中,推断给定用户指令是清晰、歧义还是不可执行。为解决该问题,我们首先提出一种针对大语言模型的不确定性估计方法,用于分类指令是明确的(即清晰)还是不明确的(即歧义或不可执行)。当指令被判定为不明确时,我们进一步利用具备情境感知上下文的大语言模型,以零样本方式区分歧义指令与不可执行指令。针对歧义指令,我们通过大语言模型生成问题与用户交互来消解歧义。我们认为,对给定指令的合理识别可减少机器人故障与非期望行为,从而提升交互式机器人代理的可靠性。我们构建了一个机器人情境感知数据集,包含高层指令、场景描述及指令类型标签(即清晰、歧义或不可执行)的配对数据。我们在所收集数据集及拾放桌面仿真场景中验证了所提方法。最后,我们通过真实世界的人机交互实验(即交接场景)展示了所提方法的可行性。