We study building embodied agents for open-ended creative tasks. While existing methods build instruction-following agents that can perform diverse open-ended tasks, none of them demonstrates creativity -- the ability to give novel and diverse solutions implicit in the language instructions. This limitation comes from their inability to convert abstract language instructions into concrete goals and perform long-horizon planning for such complicated goals. Given the observation that humans perform creative tasks with imagination, we propose a class of solutions, where the controller is enhanced with an imaginator generating detailed imaginations of task outcomes conditioned on language instructions. We introduce several approaches to implementing the components of creative agents. We implement the imaginator with either a large language model for textual imagination or a diffusion model for visual imagination. The controller can either be a behavior-cloning policy or a pre-trained foundation model generating executable codes in the environment. We benchmark creative tasks with the challenging open-world game Minecraft, where the agents create diverse buildings given free-form language instructions. We propose novel evaluation metrics for open-ended creative tasks utilizing GPT-4V, which holds many advantages over existing metrics. We perform a detailed experimental analysis of creative agents, showing that creative agents are the first AI agents accomplishing diverse building creation in the survival mode of Minecraft. Our benchmark and models are open-source for future research on creative agents (https://github.com/PKU-RL/Creative-Agents).
翻译:本研究致力于构建面向开放式创造性任务的具身智能体。现有方法虽能构建执行多样化开放式任务的指令跟随型智能体,但均未展现出创造力——即根据语言指令中隐含的要求提出新颖且多样化解决方案的能力。这一局限性源于现有方法无法将抽象语言指令转化为具体目标,并为此类复杂目标进行长程规划。基于人类通过想象力执行创造性任务的观察,我们提出一类解决方案:通过为控制器配备想象器来增强其能力,该想象器可根据语言指令生成任务结果的详细想象场景。我们提出了多种实现创造性智能体组件的技术路径:想象器可采用大型语言模型实现文本想象,或采用扩散模型实现视觉想象;控制器可采用行为克隆策略,或使用预训练基础模型生成环境可执行代码。我们以具有挑战性的开放世界游戏《我的世界》作为创造性任务的基准测试平台,要求智能体根据自由形式的语言指令创建多样化建筑。我们提出了利用GPT-4V的新型开放式创造性任务评估指标,该指标较现有评估体系具有显著优势。通过对创造性智能体进行详细实验分析,我们证明该智能体是首个能在《我的世界》生存模式下完成多样化建筑创建的人工智能体。我们的基准测试框架与模型均已开源,以促进创造性智能体的后续研究(https://github.com/PKU-RL/Creative-Agents)。