Reinforcement learning (RL) -- algorithms that teach artificial agents to interact with environments by maximising reward signals -- has achieved significant success in recent years. These successes have been facilitated by advances in algorithms (e.g., deep Q-learning, deep deterministic policy gradients, proximal policy optimisation, trust region policy optimisation, and soft actor-critic) and specialised computational resources such as GPUs and TPUs. One promising research direction involves introducing goals to allow multimodal policies, commonly through hierarchical or curriculum reinforcement learning. These methods systematically decompose complex behaviours into simpler sub-tasks, analogous to how humans progressively learn skills (e.g. we learn to run before we walk, or we learn arithmetic before calculus). However, fully automating goal creation remains an open challenge. We present a novel probabilistic curriculum learning algorithm to suggest goals for reinforcement learning agents in continuous control and navigation tasks.
翻译:强化学习(RL)——一种通过最大化奖励信号来教导智能体与环境交互的算法——近年来取得了显著成功。这些成功得益于算法(例如深度Q学习、深度确定性策略梯度、近端策略优化、信赖域策略优化和柔性演员-评论家)以及GPU和TPU等专用计算资源的进步。一个有前景的研究方向是通过引入目标来实现多模态策略,通常采用分层或课程强化学习方法。这些方法系统地将复杂行为分解为更简单的子任务,类似于人类逐步学习技能的方式(例如,我们先学会跑再学会走,或者先学算术再学微积分)。然而,如何完全自动化地创建目标仍然是一个开放的挑战。我们提出了一种新颖的概率课程学习算法,用于在连续控制和导航任务中为强化学习智能体建议目标。