A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of Unsupervised Environment Design (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses reinforcement learning (RL) to train a teacher policy to design tasks from scratch, making it possible to directly generate tasks that are adapted to the agent's current capabilities. Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance. Thus, state-of-the-art methods currently rely on curation and mutation rather than generation of new tasks. In this work, we investigate several key shortcomings of PAIRED and propose solutions for each shortcoming. As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment. We believe this work motivates a renewed emphasis on UED methods based on learned models that directly generate challenging environments, potentially unlocking more open-ended RL training and, as a result, more general agents.
翻译:训练通用能力智能体的关键挑战在于设计能促进广泛泛化和对环境变化鲁棒性的训练任务。这一挑战催生了无监督环境设计(UED)问题,其中学生智能体在教师智能体提出的自适应任务分布上进行训练。UED的开创性方法是PAIRED,它使用强化学习(RL)训练教师策略从头设计任务,从而可以直接生成适应智能体当前能力的任务。尽管有坚实的理论基础,PAIRED面临诸多阻碍其实用性能的挑战。因此,当前最先进的方法更依赖于任务筛选与变异而非生成新任务。本文研究了PAIRED的几个关键缺陷,并为每个缺陷提出了解决方案。由此,我们使PAIRED能够达到或超越最先进方法,在多个具有挑战性的程序化生成环境中(包括部分可观测迷宫导航任务和连续控制赛车环境)训练出鲁棒智能体。我们相信这项工作将重新激发对基于学习模型直接生成挑战环境的UED方法的关注,从而有望推动更开放的强化学习训练,并进而产生更通用的智能体。