A fundamental challenge in developing general learning algorithms is their tendency to forget past knowledge when adapting to new data. Addressing this problem requires a principled understanding of forgetting; yet, despite decades of study, no unified definition has emerged that provides insights into the underlying dynamics of learning. We propose an algorithm- and task-agnostic theory that characterises forgetting as a lack of self-consistency in a learner's predictive distribution, manifesting as a loss of predictive information. Our theory naturally yields a general measure of an algorithm's propensity to forget and demonstrates that exact Bayesian inference allows for adaptation without forgetting. To validate the theory, we design a comprehensive set of experiments that span classification, regression, generative modelling, and reinforcement learning. We empirically demonstrate how forgetting is present across all deep learning settings and plays a significant role in determining learning efficiency. Together, these results establish a principled understanding of forgetting and lay the foundation for analysing and improving the information retention capabilities of general learning algorithms.
翻译:开发通用学习算法的一个根本性挑战在于其适应新数据时倾向于遗忘过去的知识。解决这一问题需要对遗忘现象建立原则性理解;然而,尽管经过数十年的研究,仍未出现能够揭示学习底层动态的统一性定义。我们提出一种与算法及任务无关的理论,将遗忘特征化为学习者预测分布中自我一致性的缺失,表现为预测信息的损失。该理论自然推导出衡量算法遗忘倾向的通用度量,并证明精确贝叶斯推理能够实现无遗忘的适应。为验证理论,我们设计了涵盖分类、回归、生成建模与强化学习的综合实验。我们通过实证表明,遗忘存在于所有深度学习场景中,并对学习效率产生决定性影响。这些研究结果共同建立了对遗忘现象的原则性理解,为分析和改进通用学习算法的信息保持能力奠定了基础。