Continual learning algorithms strive to acquire new knowledge while preserving prior information. Often, these algorithms emphasise stability and restrict network updates upon learning new tasks. In many cases, such restrictions come at a cost to the model's plasticity, i.e. the model's ability to adapt to the requirements of a new task. But is all change detrimental? Here, we approach this question by proposing that activation spaces in neural networks can be decomposed into two subspaces: a readout range in which change affects prior tasks and a null space in which change does not alter prior performance. Based on experiments with this novel technique, we show that, indeed, not all activation change is associated with forgetting. Instead, only change in the subspace visible to the readout of a task can lead to decreased stability, while restricting change outside of this subspace is associated only with a loss of plasticity. Analysing various commonly used algorithms, we show that regularisation-based techniques do not fully disentangle the two spaces and, as a result, restrict plasticity more than need be. We expand our results by investigating a linear model in which we can manipulate learning in the two subspaces directly and thus causally link activation changes to stability and plasticity. For hierarchical, nonlinear cases, we present an approximation that enables us to estimate functionally relevant subspaces at every layer of a deep nonlinear network, corroborating our previous insights. Together, this work provides novel means to derive insights into the mechanisms behind stability and plasticity in continual learning and may serve as a diagnostic tool to guide developments of future continual learning algorithms that stabilise inference while allowing maximal space for learning.
翻译:持续学习算法致力于在获取新知识的同时保留先前信息。通常,这些算法强调稳定性,并在学习新任务时限制网络更新。在许多情况下,此类限制会以模型可塑性为代价,即模型适应新任务需求的能力。但所有改变都是有害的吗?本文通过提出神经网络中的激活空间可分解为两个子空间来探讨此问题:一个会影响先前任务的读出范围,以及一个不会改变先前性能的零空间。基于这种新技术的实验表明,确实并非所有激活变化都与遗忘相关。相反,只有任务读出可见子空间中的变化会导致稳定性下降,而限制该子空间外的变化仅与可塑性损失相关。通过分析各种常用算法,我们发现基于正则化的技术未能完全解耦这两个空间,从而导致对可塑性的限制超出必要程度。我们通过研究线性模型扩展了实验结果,在该模型中可直接操纵两个子空间的学习过程,从而建立激活变化与稳定性及可塑性之间的因果关联。针对分层非线性场景,我们提出一种近似方法,能够估计深度非线性网络各层的功能相关子空间,这进一步验证了我们先前的发现。总体而言,本研究为深入理解持续学习中稳定性与可塑性的机制提供了新方法,并可作为指导未来持续学习算法开发的诊断工具,在稳定推理的同时为学习保留最大空间。