Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning

We argue that one of the main obstacles for developing effective Continual Reinforcement Learning (CRL) algorithms is the negative transfer issue occurring when the new task to learn arrives. Through comprehensive experimental validation, we demonstrate that such issue frequently exists in CRL and cannot be effectively addressed by several recent work on mitigating plasticity loss of RL agents. To that end, we develop Reset & Distill (R&D), a simple yet highly effective method, to overcome the negative transfer problem in CRL. R&D combines a strategy of resetting the agent's online actor and critic networks to learn a new task and an offline learning step for distilling the knowledge from the online actor and previous expert's action probabilities. We carried out extensive experiments on long sequence of Meta-World tasks and show that our method consistently outperforms recent baselines, achieving significantly higher success rates across a range of tasks. Our findings highlight the importance of considering negative transfer in CRL and emphasize the need for robust strategies like R&D to mitigate its detrimental effects.

翻译：我们认为，开发有效的持续强化学习算法的主要障碍之一是，当新任务需要学习时出现的负迁移问题。通过全面的实验验证，我们证明这类问题在持续强化学习中普遍存在，并且近期关于缓解强化学习代理塑性损失的多项工作无法有效解决该问题。为此，我们提出了重置与蒸馏方法——一种简单但高效的方案，用于克服持续强化学习中的负迁移问题。R&D结合了两种策略：重置代理的在线行动者与评论家网络以学习新任务，以及通过离线学习步骤从在线行动者和先前专家的动作概率中蒸馏知识。我们在Meta-World长期任务序列上进行了大量实验，结果表明我们的方法持续优于近期基线方法，在多种任务中实现了显著更高的成功率。我们的发现强调了在持续强化学习中考虑负迁移的重要性，并凸显了采用如R&D等稳健策略来减轻其负面影响的需求。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日