The usability of Reinforcement Learning is restricted by the large computation times it requires. Curriculum Reinforcement Learning speeds up learning by defining a helpful order in which an agent encounters tasks, i.e. from simple to hard. Curricula based on Absolute Learning Progress (ALP) have proven successful in different environments, but waste computation on repeating already learned behaviour in new tasks. We solve this problem by introducing a new regularization method based on Self-Paced (Deep) Learning, called Self-Paced Absolute Learning Progress (SPALP). We evaluate our method in three different environments. Our method achieves performance comparable to original ALP in all cases, and reaches it quicker than ALP in two of them. We illustrate possibilities to further improve the efficiency and performance of SPALP.
翻译:强化学习的实用性受到其所需的大量计算时间的限制。课程强化学习通过定义智能体遇到任务的有利顺序(即从简单到困难)来加速学习。基于绝对学习进度的课程已在不同环境中被证明有效,但会浪费计算资源用于在新任务中重复已学会的行为。我们通过引入一种基于自定步调(深度)学习的新正则化方法(称为自定步调绝对学习进度)解决了这一问题。我们在三种不同环境中评估了该方法。在所有情况下,我们的方法均实现了与原始绝对学习进度相当的性能,并在其中两种环境中比绝对学习进度更快达到该性能。我们还展示了进一步提高自定步调绝对学习进度效率与性能的可能性。