Continual learning with deep neural networks presents challenges distinct from both the fixed-dataset and convex continual learning regimes. One such challenge is plasticity loss, wherein a neural network trained in an online fashion displays a degraded ability to fit new tasks. This problem has been extensively studied in both supervised learning and off-policy reinforcement learning (RL), where a number of remedies have been proposed. Still, plasticity loss has received less attention in the on-policy deep RL setting. Here we perform an extensive set of experiments examining plasticity loss and a variety of mitigation methods in on-policy deep RL. We demonstrate that plasticity loss is pervasive under domain shift in this regime, and that a number of methods developed to resolve it in other settings fail, sometimes even performing worse than applying no intervention at all. In contrast, we find that a class of ``regenerative'' methods are able to consistently mitigate plasticity loss in a variety of contexts, including in gridworld tasks and more challenging environments like Montezuma's Revenge and ProcGen.
翻译:深度神经网络的持续学习带来了与固定数据集和凸性持续学习机制均不同的挑战。其中一项挑战是可塑性损失,即在线训练的神经网络表现出拟合新任务的能力下降。该问题在监督学习和离策略强化学习(RL)中已得到广泛研究,并提出了多种解决方案。然而,在策略性深度强化学习场景中,可塑性损失受到的关注相对较少。本文通过一系列广泛实验,探究了策略性深度强化学习中的可塑性损失及多种缓解方法。我们证明在该机制下,领域偏移会导致普遍的可塑性损失,且许多在其他场景中开发的解决方法在此失效,有时甚至比不采取任何干预措施表现更差。相比之下,我们发现一类“再生”方法能够在多种情境中持续缓解可塑性损失,包括网格世界任务以及更复杂的环境,如《蒙特祖玛的复仇》和ProcGen。