We consider the problem of curriculum design for reinforcement learning (RL) agents in contextual multi-task settings. Existing techniques on automatic curriculum design typically require domain-specific hyperparameter tuning or have limited theoretical underpinnings. To tackle these limitations, we design our curriculum strategy, ProCuRL, inspired by the pedagogical concept of Zone of Proximal Development (ZPD). ProCuRL captures the intuition that learning progress is maximized when picking tasks that are neither too hard nor too easy for the learner. We mathematically derive ProCuRL by analyzing two simple learning settings. We also present a practical variant of ProCuRL that can be directly integrated with deep RL frameworks with minimal hyperparameter tuning. Experimental results on a variety of domains demonstrate the effectiveness of our curriculum strategy over state-of-the-art baselines in accelerating the training process of deep RL agents.
翻译:我们考虑在上下文多任务设置中为强化学习代理设计课程的问题。现有的自动课程设计技术通常需要特定领域的超参数调优,或者理论依据有限。为了解决这些限制,我们受最近发展区(ZPD)教学概念的启发,设计了名为ProCuRL的课程策略。ProCuRL体现了这样一种直觉:当为学习者选择既不太难也不太容易的任务时,学习进展最大化。我们通过分析两个简单的学习场景,从数学上推导出ProCuRL。我们还提出了ProCuRL的实用变体,可以直接集成到深度强化学习框架中,且只需极少的超参数调优。在多个领域上的实验结果表明,我们的课程策略在加速深度强化学习代理训练过程方面优于当前最先进的基线方法。