Recent work on designing an appropriate distribution of environments has shown promise for training effective generally capable agents. Its success is partly because of a form of adaptive curriculum learning that generates environment instances (or levels) at the frontier of the agent's capabilities. However, such an environment design framework often struggles to find effective levels in challenging design spaces and requires costly interactions with the environment. In this paper, we aim to introduce diversity in the Unsupervised Environment Design (UED) framework. Specifically, we propose a task-agnostic method to identify observed/hidden states that are representative of a given level. The outcome of this method is then utilized to characterize the diversity between two levels, which as we show can be crucial to effective performance. In addition, to improve sampling efficiency, we incorporate the self-play technique that allows the environment generator to automatically generate environments that are of great benefit to the training agent. Quantitatively, our approach, Diversity-induced Environment Design via Self-Play (DivSP), shows compelling performance over existing methods.
翻译:近期关于设计合适环境分布的研究在训练高效通用智能体方面展现出了良好前景。其成功部分源于一种自适应课程学习机制,该机制会在智能体能力边界处生成环境实例(或关卡)。然而,这类环境设计框架在复杂设计空间中往往难以找到有效关卡,且需要与环境进行高成本的交互。本文旨在将多样性引入无监督环境设计(UED)框架中。具体而言,我们提出了一种任务无关的方法来识别代表特定关卡的可观测/隐藏状态。该方法的结果随后被用于表征两个关卡之间的多样性,研究表明这对实现高效性能至关重要。此外,为提升采样效率,我们引入了自博弈技术,使环境生成器能够自动生成对训练智能体最有益的环境。定量实验表明,我们的方法——基于自博弈的多样性驱动环境设计(DivSP)——在现有方法中展现了令人信服的性能优势。