We argue that one of the main obstacles for developing effective Continual Reinforcement Learning (CRL) algorithms is the negative transfer issue occurring when the new task to learn arrives. Through comprehensive experimental validation, we demonstrate that such issue frequently exists in CRL and cannot be effectively addressed by several recent work on mitigating plasticity loss of RL agents. To that end, we develop Reset & Distill (R&D), a simple yet highly effective method, to overcome the negative transfer problem in CRL. R&D combines a strategy of resetting the agent's online actor and critic networks to learn a new task and an offline learning step for distilling the knowledge from the online actor and previous expert's action probabilities. We carried out extensive experiments on long sequence of Meta-World tasks and show that our method consistently outperforms recent baselines, achieving significantly higher success rates across a range of tasks. Our findings highlight the importance of considering negative transfer in CRL and emphasize the need for robust strategies like R&D to mitigate its detrimental effects.
翻译:我们认为,开发有效的持续强化学习算法的主要障碍之一是,当新任务需要学习时出现的负迁移问题。通过全面的实验验证,我们证明这类问题在持续强化学习中普遍存在,并且近期关于缓解强化学习代理塑性损失的多项工作无法有效解决该问题。为此,我们提出了重置与蒸馏方法——一种简单但高效的方案,用于克服持续强化学习中的负迁移问题。R&D结合了两种策略:重置代理的在线行动者与评论家网络以学习新任务,以及通过离线学习步骤从在线行动者和先前专家的动作概率中蒸馏知识。我们在Meta-World长期任务序列上进行了大量实验,结果表明我们的方法持续优于近期基线方法,在多种任务中实现了显著更高的成功率。我们的发现强调了在持续强化学习中考虑负迁移的重要性,并凸显了采用如R&D等稳健策略来减轻其负面影响的需求。