Large language models (LLMs) offer significant promise as a knowledge source for task learning. Prompt engineering has been shown to be effective for eliciting knowledge from an LLM, but alone it is insufficient for acquiring relevant, situationally grounded knowledge for an embodied agent learning novel tasks. We describe a cognitive-agent approach that extends and complements prompt engineering, mitigating its limitations and thus enabling an agent to acquire new task knowledge matched to its native language capabilities, embodiment, environment, and user preferences. The approach is to increase the response space of LLMs and deploy general strategies, embedded within the autonomous agent, to evaluate, repair, and select among candidate responses produced by the LLM. We describe the approach and experiments that show how an agent, by retrieving and evaluating a breadth of responses from the LLM, can achieve 77-94% task completion in one-shot learning without user oversight. The approach achieves 100% task completion when human oversight (such as an indication of preference) is provided. Further, the type of oversight largely shifts from explicit, natural language instruction to simple confirmation/discomfirmation of high-quality responses that have been vetted by the agent before presentation to a user.
翻译:大型语言模型(LLM)作为任务学习的知识源展现出巨大潜力。提示工程已被证明能有效从LLM中提取知识,但仅凭此方法难以使具身智能体在获取相关且具有情境基础的知识时,完成新颖任务的学习。本文描述了一种认知智能体方法,该方法扩展并补充了提示工程,通过缓解其局限性,使智能体能够获取与其自然语言能力、具身体现、环境及用户偏好相匹配的新任务知识。该方法的核心在于扩展LLM的响应空间,并部署嵌入自主智能体中的通用策略,以评估、修复和筛选LLM生成的候选响应。本文阐述了该方法及实验过程,实验表明:智能体通过检索和评估LLM的广泛响应,可在无用户监督的一次学习场景中实现77-94%的任务完成率。当提供人工监督(如偏好指示)时,任务完成率达100%。此外,监督类型显著从显式自然语言指令转向对高质量响应的简单确认/否定,这些响应在呈现给用户前已由智能体预先筛选。