Transfer learning can be applied in deep reinforcement learning to accelerate the training of a policy in a target task by transferring knowledge from a policy learned in a related source task. This is commonly achieved by copying pretrained weights from the source policy to the target policy prior to training, under the constraint that they use the same model architecture. However, not only does this require a robust representation learned over a wide distribution of states -- often failing to transfer between specialist models trained over single tasks -- but it is largely uninterpretable and provides little indication of what knowledge is transferred. In this work, we propose an alternative approach to transfer learning between tasks based on action advising, in which a teacher trained in a source task actively guides a student's exploration in a target task. Through introspection, the teacher is capable of identifying when advice is beneficial to the student and should be given, and when it is not. Our approach allows knowledge transfer between policies agnostic of the underlying representations, and we empirically show that this leads to improved convergence rates in Gridworld and Atari environments while providing insight into what knowledge is transferred.
翻译:迁移学习可应用于深度强化学习中,通过从相关源任务学到的策略迁移知识,加速目标任务中策略的训练。这通常通过在训练前将源策略的预训练权重复制到目标策略来实现,但前提是两者使用相同的模型架构。然而,这不仅要求在学习到的状态分布上具备稳健的表示(通常难以在单一任务训练的专业模型间迁移),而且其过程基本不可解释,且难以说明哪些知识被迁移了。本文提出了一种基于动作建议的跨任务迁移学习替代方法:在源任务中训练的教师智能体主动引导学生在目标任务中进行探索。通过内省,教师能够识别何时给予学生建议有益(应提供),何时则无益。该方法允许在忽略底层表示差异的情况下实现策略间的知识迁移,实验表明,在Gridworld和Atari环境中,它不仅能提高收敛速度,还能揭示哪些知识被迁移了。