In this paper, we study the Tiered Reinforcement Learning setting, a parallel transfer learning framework, where the goal is to transfer knowledge from the low-tier (source) task to the high-tier (target) task to reduce the exploration risk of the latter while solving the two tasks in parallel. Unlike previous work, we do not assume the low-tier and high-tier tasks share the same dynamics or reward functions, and focus on robust knowledge transfer without prior knowledge on the task similarity. We identify a natural and necessary condition called the "Optimal Value Dominance" for our objective. Under this condition, we propose novel online learning algorithms such that, for the high-tier task, it can achieve constant regret on partial states depending on the task similarity and retain near-optimal regret when the two tasks are dissimilar, while for the low-tier task, it can keep near-optimal without making sacrifice. Moreover, we further study the setting with multiple low-tier tasks, and propose a novel transfer source selection mechanism, which can ensemble the information from all low-tier tasks and allow provable benefits on a much larger state-action space.
翻译:本文研究了分层强化学习场景,这是一种并行迁移学习框架,其目标是通过将低层(源)任务的知识迁移到高层(目标)任务,在并行求解两个任务的同时降低高层任务的探索风险。与先前工作不同,我们假设低层与高层任务不共享相同的动力学或奖励函数,并专注于在无任务相似性先验知识的情况下实现鲁棒知识迁移。我们识别出一个自然且必要的条件,即“最优值支配性”,作为优化目标。在此条件下,我们提出了新颖的在线学习算法:对于高层任务,该算法能根据任务相似性在部分状态上达到常数后悔界,并在两任务不相似时保持近最优后悔界;对于低层任务,该算法无需牺牲即可保持近最优性能。此外,我们进一步研究了包含多个低层任务的场景,并提出了一种新颖的迁移源选择机制,该机制能够集成所有低层任务的信息,并在更大的状态-动作空间上获得可证明的收益。