Recent work on designing an appropriate distribution of environments has shown promise for training effective generally capable agents. Its success is partly because of a form of adaptive curriculum learning that generates environment instances (or levels) at the frontier of the agent's capabilities. However, such an environment design framework often struggles to find effective levels in challenging design spaces and requires costly interactions with the environment. In this paper, we aim to introduce diversity in the Unsupervised Environment Design (UED) framework. Specifically, we propose a task-agnostic method to identify observed/hidden states that are representative of a given level. The outcome of this method is then utilized to characterize the diversity between two levels, which as we show can be crucial to effective performance. In addition, to improve sampling efficiency, we incorporate the self-play technique that allows the environment generator to automatically generate environments that are of great benefit to the training agent. Quantitatively, our approach, Diversity-induced Environment Design via Self-Play (DivSP), shows compelling performance over existing methods.
翻译:近期关于设计合适环境分布的研究在训练高效通用智能体方面展现出潜力,其成功部分源于一种自适应课程学习机制——该机制在智能体能力边界生成环境实例(或称关卡)。然而,此类环境设计框架在复杂设计空间中难以发现有效关卡,且需要与环境进行高成本交互。本文旨在向无监督环境设计框架中引入多样性。具体而言,我们提出了一种任务无关方法,用于识别给定关卡中具有代表性的观测/隐藏状态,并利用该方法的结果表征两关卡间的多样性——我们证明这种多样性对有效性能至关重要。此外,为提升采样效率,我们融入自我博弈技术,使环境生成器能够自动生成对训练智能体具有重大价值的场景。定量实验表明,我们的方法——基于自我博弈的多样性环境设计(DivSP)——在现有方法中展现出具有竞争力的性能表现。