Unsupervised Environment Design (UED) has emerged as a promising approach to developing general-purpose agents through automated curriculum generation. Popular UED methods focus on Open-Endedness, where teacher algorithms rely on stochastic processes for infinite generation of useful environments. This assumption becomes impractical in resource-constrained scenarios where teacher-student interaction opportunities are limited. To address this challenge, we introduce a hierarchical Markov Decision Process (MDP) framework for environment design. Our framework features a teacher agent that leverages student policy representations derived from discovered evaluation environments, enabling it to generate training environments based on the student's capabilities. To improve efficiency, we incorporate a generative model that augments the teacher's training dataset with synthetic data, reducing the need for teacher-student interactions. In experiments across several domains, we show that our method outperforms baseline approaches while requiring fewer teacher-student interactions in a single episode. The results suggest the applicability of our approach in settings where training opportunities are limited.
翻译:无监督环境设计(UED)已成为通过自动课程生成开发通用智能体的重要范式。当前主流UED方法聚焦于开放式设计,其教师算法依赖随机过程无限生成有效环境。这种假设在资源受限场景中面临挑战,因为师生交互机会往往有限。为解决该问题,我们提出一种用于环境设计的层次化马尔可夫决策过程框架。该框架中的教师智能体能够利用从已发现评估环境中提取的学生策略表示,从而根据学生能力生成训练环境。为提升效率,我们引入生成模型以合成数据增强教师训练数据集,减少师生交互需求。在多个领域的实验中,本方法在单次训练周期内以更少的师生交互次数超越了基线方法。实验结果表明,该方法在训练机会受限的场景中具有良好适用性。