Large language models (LLMs) offer significant promise as a knowledge source for robotic task learning. Prompt engineering has been shown to be effective for eliciting knowledge from an LLM but alone is insufficient for acquiring relevant, situationally grounded knowledge for an embodied robotic agent learning novel tasks. We describe a cognitive-agent approach that extends and complements prompt engineering, mitigating its limitations, and thus enabling a robot to acquire new task knowledge matched to its native language capabilities, embodiment, environment, and user preferences. The approach is to increase the response space of LLMs and deploy general strategies, embedded within the autonomous robot, to evaluate, repair, and select among candidate responses produced by the LLM. We describe the approach and experiments that show how a robot, by retrieving and evaluating a breadth of responses from the LLM, can achieve >75% task completion in one-shot learning without user oversight. The approach achieves 100% task completion when human oversight (such as indication of preference) is provided, while greatly reducing how much human oversight is needed.
翻译:大型语言模型(LLMs)作为机器人任务学习的知识来源具有显著前景。提示工程已被证明能有效从LLM中获取知识,但仅靠它不足以让具身机器人智能体在习得新任务时获得相关且具情境基础的知识。我们提出一种认知智能体方法,该方法扩展并补充了提示工程,弥补其局限性,从而使机器人能够获取与其自然语言能力、具身形态、环境及用户偏好相匹配的新任务知识。该方法的核心是扩大LLM的响应空间,并在自主机器人系统中嵌入通用策略,用于评估、修复并筛选LLM生成的候选响应。我们描述了该方法及实验,结果显示:机器人通过检索和评估LLM的广泛响应,能在无需用户监督的一次性学习中实现>75%的任务完成率。当提供人类监督(如偏好指示)时,该方法可实现100%的任务完成率,同时大幅降低所需的人类监督程度。