Catastrophic forgetting in continual adaptation is usually studied through parameter drift, replay, or distillation, but these views do not identify which output-space directions are vulnerable. We give a function-space account in the NTK regime: new-task training induces old-task prediction drift through the cross-task kernel, yielding a closed-form predictor for the forgetting vector before any new-task gradient step. In frozen-backbone linear-head PEFT-CL, where the model is linear in the trainable parameters, the predictor is exact up to numerical precision; for nonlinear adapters/full fine-tuning, it is a local NTK approximation. The same expression reveals that forgetting concentrates in a small number of old-task NTK eigenmodes and under frozen linear heads gives a Kronecker scaling rule for the vulnerable rank. These results clarify the relation to prior NTK-overlap theory, explain why parameter-space regularizers can miss output-space interference, and motivate a targeted spectral regularizer.
翻译:持续适应中的灾难性遗忘通常通过参数漂移、重放或知识蒸馏进行研究,但这些视角未能识别哪些输出空间方向是脆弱的。我们在神经正切核(NTK)框架下提出一种函数空间解释:新任务训练通过跨任务核诱导旧任务预测漂移,从而在新任务梯度更新步骤之前获得遗忘向量的闭合形式预测器。在冻结骨干网络、线性分类头的参数高效迁移学习(PEFT-CL)场景中——该模型对可训练参数呈线性关系——该预测器在数值精度内是精确的;对于非线性适配器/全参数微调,则构成局部NTK近似。同一表达式揭示,遗忘集中在少数旧任务NTK本征模态中,而在冻结线性分类头条件下,给出了脆弱秩的Kronecker缩放规则。这些结果厘清了与先前NTK重叠理论的关系,解释了参数空间正则化为何会遗漏输出空间干扰,并启发了一种针对性的谱正则化方法。