In continual learning, catastrophic forgetting is affected by multiple aspects of the tasks. Previous works have analyzed separately how forgetting is affected by either task similarity or overparameterization. In contrast, our paper examines how task similarity and overparameterization jointly affect forgetting in an analyzable model. Specifically, we focus on two-task continual linear regression, where the second task is a random orthogonal transformation of an arbitrary first task (an abstraction of random permutation tasks). We derive an exact analytical expression for the expected forgetting - and uncover a nuanced pattern. In highly overparameterized models, intermediate task similarity causes the most forgetting. However, near the interpolation threshold, forgetting decreases monotonically with the expected task similarity. We validate our findings with linear regression on synthetic data, and with neural networks on established permutation task benchmarks.
翻译:在持续学习中,灾难性遗忘受到任务多方面因素的影响。先前的研究分别分析了任务相似性或过参数化如何单独影响遗忘。与此不同,我们的论文研究任务相似性和过参数化如何共同影响一个可解析模型中的遗忘。具体而言,我们聚焦于双任务持续线性回归,其中第二个任务是任意第一个任务(随机置换任务的一种抽象)的随机正交变换。我们推导了期望遗忘量的精确解析表达式,并揭示了一个微妙模式:在高度过参数化的模型中,中等任务相似性会导致最大遗忘;然而,在插值阈值附近,遗忘量随期望任务相似性的增加而单调递减。我们通过合成数据的线性回归实验以及基于神经网络的经典置换任务基准验证了我们的发现。