Continual learning with deep neural networks presents challenges distinct from both the fixed-dataset and convex continual learning regimes. One such challenge is plasticity loss, wherein a neural network trained in an online fashion displays a degraded ability to fit new tasks. This problem has been extensively studied in both supervised learning and off-policy reinforcement learning (RL), where a number of remedies have been proposed. Still, plasticity loss has received less attention in the on-policy deep RL setting. Here we perform an extensive set of experiments examining plasticity loss and a variety of mitigation methods in on-policy deep RL. We demonstrate that plasticity loss is pervasive under domain shift in this regime, and that a number of methods developed to resolve it in other settings fail, sometimes even resulting in performance that is worse than performing no intervention at all. In contrast, we find that a class of ``regenerative'' methods are able to consistently mitigate plasticity loss in a variety of contexts, including in gridworld tasks and more challenging environments like Montezuma's Revenge and ProcGen.
翻译:深度神经网络的持续学习呈现出与固定数据集和凸性持续学习机制均不同的挑战。其中一个挑战是可塑性损失,即在线训练的神经网络表现出适应新任务能力的退化。该问题已在监督学习和离策略强化学习中得到广泛研究,并提出了多种解决方案。然而,在策略性深度强化学习场景中,可塑性损失受到的关注相对较少。本文通过一系列实验,系统研究了策略性深度强化学习中的可塑性损失及多种缓解方法。我们证明在该机制下,领域偏移会导致普遍的可塑性损失,且许多在其他场景中开发的解决方法在此失效,有时甚至导致比不干预更差的性能。相比之下,我们发现一类"再生"方法能够在多种情境下持续缓解可塑性损失,包括网格世界任务以及更复杂的蒙特祖玛的复仇和ProcGen等环境。