Continual learning, the ability of a model to adapt to an ongoing sequence of tasks without forgetting earlier ones, is a central goal of artificial intelligence. To better understand its underlying mechanisms, we study the limitations of continual learning in a tractable yet representative setting. Specifically, we analyze one-hidden-layer quadratic neural networks trained by gradient descent on a sequence of XOR-cluster datasets with Gaussian noise, where different tasks correspond to clusters with orthogonal means. Our analysis is based on a tight characterization of gradient descent dynamics for the training loss, which yields explicit bounds on the rate of train-time forgetting as functions of the number of iterations, sample size, number of tasks, and hidden-layer width. We then leverage an algorithmic stability framework to bound the generalization gap, leading to corresponding guarantees on test-time forgetting. Together, our results provide the first closed-form guarantees for forgetting in continual learning with neural networks and show how key problem parameters jointly govern forgetting dynamics. Numerical experiments corroborate our theoretical results.
翻译:持续学习,即模型能够适应持续的任务序列而不遗忘先前任务的能力,是人工智能的核心目标。为深入理解其基本机制,我们在可处理但具有代表性的场景中研究了持续学习的局限性。具体而言,我们分析了由梯度下降训练的单隐藏层二次神经网络,该网络处理一系列带有高斯噪声的XOR聚类数据集,其中不同任务对应正交均值的聚类。我们的分析基于对训练损失梯度下降动态的严格刻画,从而得到了训练阶段遗忘率作为迭代次数、样本量、任务数和隐藏层宽度函数的显式边界。随后,我们利用算法稳定性框架界定了泛化差距,进而给出测试阶段遗忘的相应保证。综合而言,我们的研究成果首次为神经网络持续学习中的遗忘现象提供了封闭形式保证,揭示了关键问题参数如何共同支配遗忘动态。数值实验验证了我们的理论结果。