In this paper, we study the Tiered Reinforcement Learning setting, a parallel transfer learning framework, where the goal is to transfer knowledge from the low-tier (source) task to the high-tier (target) task to reduce the exploration risk of the latter while solving the two tasks in parallel. Unlike previous work, we do not assume the low-tier and high-tier tasks share the same dynamics or reward functions, and focus on robust knowledge transfer without prior knowledge on the task similarity. We identify a natural and necessary condition called the ``Optimal Value Dominance'' for our objective. Under this condition, we propose novel online learning algorithms such that, for the high-tier task, it can achieve constant regret on partial states depending on the task similarity and retain near-optimal regret when the two tasks are dissimilar, while for the low-tier task, it can keep near-optimal without making sacrifice. Moreover, we further study the setting with multiple low-tier tasks, and propose a novel transfer source selection mechanism, which can ensemble the information from all low-tier tasks and allow provable benefits on a much larger state-action space.
翻译:本文研究了分层强化学习(Tiered Reinforcement Learning)这一并行迁移学习框架,其目标是在并行求解低层(源)任务与高层(目标)任务时,将从低层任务中获取的知识迁移至高层任务,以降低后者的探索风险。与以往工作不同,我们并不假设低层任务与高层任务共享相同的动力学或奖励函数,而是聚焦于在缺乏任务相似性先验知识的情况下实现鲁棒知识迁移。我们识别出一个面向目标的自然而必要条件,称为“最优值占优”(Optimal Value Dominance)。在此条件下,我们提出新颖的在线学习算法:对于高层任务,该算法可根据任务相似性在部分状态上实现常数遗憾,并在两任务不相似时保持近最优遗憾;而对于低层任务,该算法无需牺牲性能即可保持近最优性。此外,我们进一步研究了包含多个低层任务的场景,并提出一种新颖的迁移源选择机制,该机制能够整合所有低层任务的信息,并在远更大的状态-动作空间上提供可证明的效益。