We study generalization in an overparameterized continual linear regression setting, where a model is trained with L2 (isotropic) regularization across a sequence of tasks. We derive a closed-form expression for the expected generalization loss in the high-dimensional regime that holds for arbitrary linear teachers. We demonstrate that isotropic regularization mitigates label noise under both single-teacher and multiple i.i.d. teacher settings, whereas prior work accommodating multiple teachers either did not employ regularization or used memory-demanding methods. Furthermore, we prove that the optimal fixed regularization strength scales nearly linearly with the number of tasks $T$, specifically as $T/\ln T$. To our knowledge, this is the first such result in theoretical continual learning. Finally, we validate our theoretical findings through experiments on linear regression and neural networks, illustrating how this scaling law affects generalization and offering a practical recipe for the design of continual learning systems.
翻译:我们研究过参数化持续线性回归场景中的泛化问题,其中模型通过跨任务序列的L2(各向同性)正则化进行训练。我们推导出高维状态下适用于任意线性教师模型的期望泛化损失的闭式表达式。我们证明在各向同性正则化下,无论是单一教师还是多个独立同分布教师设置,标签噪声均能得到缓解;而先前适应多教师模型的研究要么未采用正则化,要么使用了内存密集型方法。此外,我们证明了最优固定正则化强度与任务数量$T$近似呈线性比例关系,具体为$T/\ln T$。据我们所知,这是理论持续学习领域首个此类结果。最后,我们通过线性回归和神经网络的实验验证了理论发现,阐明了该比例定律如何影响泛化性能,并为持续学习系统的设计提供了实用方案。