Large language models (LLMs) offer significant promise as a knowledge source for robotic task learning. Prompt engineering has been shown to be effective for eliciting knowledge from an LLM but alone is insufficient for acquiring relevant, situationally grounded knowledge for an embodied robotic agent learning novel tasks. We describe a cognitive-agent approach that extends and complements prompt engineering, mitigating its limitations, and thus enabling a robot to acquire new task knowledge matched to its native language capabilities, embodiment, environment, and user preferences. The approach is to increase the response space of LLMs and deploy general strategies, embedded within the autonomous robot, to evaluate, repair, and select among candidate responses produced by the LLM. We describe the approach and experiments that show how a robot, by retrieving and evaluating a breadth of responses from the LLM, can achieve >75% task completion in one-shot learning without user oversight. The approach achieves 100% task completion when human oversight (such as indication of preference) is provided, while greatly reducing how much human oversight is needed.
翻译:大语言模型(LLMs)作为机器人任务学习的知识来源具有显著潜力。提示工程已被证明能有效从大语言模型中获取知识,但仅依赖提示工程不足以让具身智能体在学习新任务时获取相关且情境化的知识。我们描述了一种认知智能体方法,该方法扩展并补充了提示工程,减轻其局限性,从而使机器人能够获取与其自然语言能力、具身形态、环境及用户偏好相匹配的新任务知识。该方法的核心在于扩大大语言模型的响应空间,并在自主机器人中嵌入通用策略,用于评估、修复和筛选大语言模型生成的候选响应。我们描述了该方法及实验,实验表明:机器人通过检索和评估大语言模型的广泛响应,可以在无需用户监督的一次性学习中实现>75%的任务完成率。当提供人为监督(如偏好指示)时,该方法在任务完成率达到100%的同时,显著减少了所需的人工监督量。