We argue that the negative transfer problem occurring when the new task to learn arrives is an important problem that needs not be overlooked when developing effective Continual Reinforcement Learning (CRL) algorithms. Through comprehensive experimental validation, we demonstrate that such issue frequently exists in CRL and cannot be effectively addressed by several recent work on mitigating plasticity loss of RL agents. To that end, we develop Reset & Distill (R&D), a simple yet highly effective method, to overcome the negative transfer problem in CRL. R&D combines a strategy of resetting the agent's online actor and critic networks to learn a new task and an offline learning step for distilling the knowledge from the online actor and previous expert's action probabilities. We carried out extensive experiments on long sequence of Meta World tasks and show that our method consistently outperforms recent baselines, achieving significantly higher success rates across a range of tasks. Our findings highlight the importance of considering negative transfer in CRL and emphasize the need for robust strategies like R&D to mitigate its detrimental effects.
翻译:我们认为,在学习新任务时出现的负迁移问题是开发有效的持续强化学习(CRL)算法时不容忽视的重要问题。通过全面的实验验证,我们证明该问题在CRL中普遍存在,且现有若干缓解RL智能体可塑性损失的研究工作无法有效解决此问题。为此,我们提出了重置与蒸馏(R&D)这一简洁而高效的方法来克服CRL中的负迁移问题。R&D结合了重置智能体在线执行者与评判器网络以学习新任务的策略,以及通过离线学习步骤从在线执行者与先前专家动作概率中蒸馏知识的机制。我们在长序列Meta World任务上进行了大量实验,结果表明我们的方法始终优于现有基线方案,在一系列任务中实现了显著更高的成功率。本研究结果凸显了在CRL中考虑负迁移问题的重要性,并强调了需要采用R&D等鲁棒策略来减轻其负面影响。