Continual learning algorithms strive to acquire new knowledge while preserving prior information. Often, these algorithms emphasise stability and restrict network updates upon learning new tasks. In many cases, such restrictions come at a cost to the model's plasticity, i.e. the model's ability to adapt to the requirements of a new task. But is all change detrimental? Here, we approach this question by proposing that activation spaces in neural networks can be decomposed into two subspaces: a readout range in which change affects prior tasks and a null space in which change does not alter prior performance. Based on experiments with this novel technique, we show that, indeed, not all activation change is associated with forgetting. Instead, the only change in the subspace visible to the readout of a task can lead to decreased stability, while restricting change outside of this subspace is associated only with a loss of plasticity. Analysing various commonly used algorithms, we show that regularisation-based techniques do not fully disentangle the two spaces and, as a result, restrict plasticity more than need be. We expand our results by investigating a linear model in which we can manipulate learning in the two subspaces directly and thus causally link activation changes to stability and plasticity. For hierarchical, nonlinear cases, we present an approximation that enables us to estimate functionally relevant subspaces at every layer of a deep nonlinear network, corroborating our previous insights. Together, this work provides novel means to derive insights into the mechanisms behind stability and plasticity in continual learning and may serve as a diagnostic tool to guide developments of future continual learning algorithms that stabilise inference while allowing maximal space for learning.
翻译:持续学习算法致力于在保留先前知识的同时获取新信息。这类算法通常强调稳定性,并在学习新任务时限制网络更新。在许多情况下,这种限制会以牺牲模型的可塑性为代价,即模型适应新任务需求的能力。但所有变化都是有害的吗?在此,我们通过提出神经网络中的激活空间可以分解为两个子空间来探讨这一问题:一个读出范围,其中变化会影响先前任务;以及一个零空间,其中变化不会改变先前性能。基于这项新技术实验,我们表明,确实并非所有激活变化都与遗忘相关。相反,只有任务读出可见子空间内的变化可能导致稳定性下降,而限制该子空间外的变化仅与可塑性损失相关。通过分析多种常用算法,我们发现基于正则化的技术未能完全解耦这两个空间,因此不必要地限制了可塑性。我们通过研究一个线性模型来扩展结果,在该模型中我们可以直接操纵两个子空间中的学习,从而因果地将激活变化与稳定性和可塑性联系起来。对于分层非线性情况,我们提出了一种近似方法,能够估计深度非线性网络每一层功能相关子空间,从而印证了先前的见解。综上所述,这项工作为理解持续学习中稳定性和可塑性背后的机制提供了新方法,并可作为诊断工具,指导未来开发既能稳定推断又能为学习留出最大空间的持续学习算法。