We study building multi-task agents in open-world environments. Without human demonstrations, learning to accomplish long-horizon tasks in a large open-world environment with reinforcement learning (RL) is extremely inefficient. To tackle this challenge, we convert the multi-task learning problem into learning basic skills and planning over the skills. Using the popular open-world game Minecraft as the testbed, we propose three types of fine-grained basic skills, and use RL with intrinsic rewards to acquire skills. A novel Finding-skill that performs exploration to find diverse items provides better initialization for other skills, improving the sample efficiency for skill learning. In skill planning, we leverage the prior knowledge in Large Language Models to find the relationships between skills and build a skill graph. When the agent is solving a task, our skill search algorithm walks on the skill graph and generates the proper skill plans for the agent. In experiments, our method accomplishes 40 diverse Minecraft tasks, where many tasks require sequentially executing for more than 10 skills. Our method outperforms baselines by a large margin and is the most sample-efficient demonstration-free RL method to solve Minecraft Tech Tree tasks. The project's website and code can be found at https://sites.google.com/view/plan4mc.
翻译:我们研究了在开放世界中构建多任务智能体的问题。在没有人类示范的情况下,通过强化学习在大型开放世界环境中完成长时任务效率极低。为应对这一挑战,我们将多任务学习问题转化为学习基础技能与基于技能进行规划。以流行的开放世界游戏《我的世界》为测试平台,我们提出了三种细粒度基础技能,并采用内在奖励强化学习获取技能。一种新颖的"探索技能"通过寻找多样化物品进行探索,为其他技能提供了更好的初始化,从而提升了技能学习的样本效率。在技能规划环节,我们利用大型语言模型中的先验知识发现技能间关系并构建技能图谱。当智能体执行任务时,我们的技能搜索算法在技能图谱上遍历,生成合理的技能执行计划。实验中,我们的方法成功完成了40项多样化的《我的世界》任务,其中许多任务需要连续执行超过10个技能。该方法以显著优势超越基准模型,是解决《我的世界》科技树任务中样本效率最高的无示范强化学习方法。项目网站与代码详见https://sites.google.com/view/plan4mc。