In multi-task reinforcement learning, it is possible to improve the data efficiency of training agents by transferring knowledge from other different but related tasks. Because the experiences from different tasks are usually biased toward the specific task goals. Traditional methods rely on Kullback-Leibler regularization to stabilize the transfer of knowledge from one task to the others. In this work, we explore the direction of replacing the Kullback-Leibler divergence with a novel Optimal transport-based regularization. By using the Sinkhorn mapping, we can approximate the Optimal transport distance between the state distribution of tasks. The distance is then used as an amortized reward to regularize the amount of sharing information. We experiment our frameworks on several grid-based navigation multi-goal to validate the effectiveness of the approach. The results show that our added Optimal transport-based rewards are able to speed up the learning process of agents and outperforms several baselines on multi-task learning.
翻译:在多任务强化学习中,通过迁移来自其他不同但相关任务的知识,可以提高训练智能体的数据效率。由于不同任务的体验通常偏向于特定任务目标,传统方法依赖Kullback-Leibler正则化来稳定任务间的知识迁移。本研究探索了用基于最优传输的新型正则化替代Kullback-Leibler散度的方向。通过使用Sinkhorn映射,我们可以近似计算任务状态分布之间的最优传输距离,并将该距离作为摊销奖励来调节信息共享量。我们在多个基于网格的导航多目标场景下实验了该框架以验证方法的有效性。结果表明,我们添加的基于最优传输的奖励能够加速智能体的学习过程,并在多任务学习上优于多个基线方法。