In multi-goal Reinforcement Learning, an agent can share experience between related training tasks, resulting in better generalization for new tasks at test time. However, when the goal space has discontinuities and the reward is sparse, a majority of goals are difficult to reach. In this context, a curriculum over goals helps agents learn by adapting training tasks to their current capabilities. In this work we propose Stein Variational Goal Generation (SVGG), which samples goals of intermediate difficulty for the agent, by leveraging a learned predictive model of its goal reaching capabilities. The distribution of goals is modeled with particles that are attracted in areas of appropriate difficulty using Stein Variational Gradient Descent. We show that SVGG outperforms state-of-the-art multi-goal Reinforcement Learning methods in terms of success coverage in hard exploration problems, and demonstrate that it is endowed with a useful recovery property when the environment changes.
翻译:在多目标强化学习中,智能体能够在相关训练任务间共享经验,从而在测试时对未知任务实现更优的泛化能力。然而,当目标空间存在不连续性且奖励稀疏时,多数目标难以达成。在此背景下,基于课程学习的目标生成策略通过使训练任务适应智能体当前能力,有效促进学习进程。本文提出Stein变分目标生成(SVGG)方法,该方法利用智能体目标达成能力的预测模型,采样具有适中难度的训练目标。目标分布通过粒子建模,并利用Stein变分梯度下降法将这些粒子吸引至难度适中的区域。实验表明,在具有挑战性的探索难题中,SVGG方法在成功率覆盖指标上超越当前最先进的多目标强化学习方法,并展现出在环境变化时具有优良的恢复特性。