Curriculum reinforcement learning (CRL) allows solving complex tasks by generating a tailored sequence of learning tasks, starting from easy ones and subsequently increasing their difficulty. Although the potential of curricula in RL has been clearly shown in various works, it is less clear how to generate them for a given learning environment, resulting in various methods aiming to automate this task. In this work, we focus on framing curricula as interpolations between task distributions, which has previously been shown to be a viable approach to CRL. Identifying key issues of existing methods, we frame the generation of a curriculum as a constrained optimal transport problem between task distributions. Benchmarks show that this way of curriculum generation can improve upon existing CRL methods, yielding high performance in various tasks with different characteristics.
翻译:课程强化学习(CRL)通过生成一系列定制化的学习任务(从简单任务开始,逐步增加难度)来解决复杂任务。尽管已有研究清晰展示了课程在强化学习中的潜力,但如何针对特定学习环境生成课程仍不明确,因此催生了多种旨在实现该过程自动化的方法。本文聚焦于将课程构建为任务分布之间的插值——这种方法此前已被证明是应对CRL的有效途径。在指出现有方法关键缺陷的基础上,我们将课程生成问题转化为任务分布之间带约束的最优传输问题。基准测试表明,这种课程生成方式能改进现有CRL方法,在不同特征的任务中均实现卓越性能。