On Local Overfitting and Forgetting in Deep Neural Networks

The infrequent occurrence of overfitting in deep neural networks is perplexing: contrary to theoretical expectations, increasing model size often enhances performance in practice. But what if overfitting does occur, though restricted to specific sub-regions of the data space? In this work, we propose a novel score that captures the forgetting rate of deep models on validation data. We posit that this score quantifies local overfitting: a decline in performance confined to certain regions of the data space. We then show empirically that local overfitting occurs regardless of the presence of traditional overfitting. Using the framework of deep over-parametrized linear models, we offer a certain theoretical characterization of forgotten knowledge, and show that it correlates with knowledge forgotten by real deep models. Finally, we devise a new ensemble method that aims to recover forgotten knowledge, relying solely on the training history of a single network. When combined with self-distillation, this method enhances the performance of any trained model without adding inference costs. Extensive empirical evaluations demonstrate the efficacy of our method across multiple datasets, contemporary neural network architectures, and training protocols.

翻译：深度神经网络中过拟合现象的不常发生令人困惑：与理论预期相反，在实践中增加模型规模往往能提升性能。但如果过拟合确实发生了，只是局限于数据空间的特定子区域呢？本研究提出了一种新颖的评分指标，用于捕捉深度模型在验证数据上的遗忘率。我们认为该指标可量化局部过拟合——即局限于数据空间某些区域的性能下降现象。随后我们通过实证表明，无论是否存在传统意义上的过拟合，局部过拟合都会发生。借助深度过参数化线性模型的理论框架，我们对被遗忘知识进行了理论刻画，并证明其与真实深度模型所遗忘的知识具有相关性。最后，我们设计了一种仅依赖单个网络训练历史的新型集成方法，旨在恢复被遗忘的知识。当与自蒸馏技术结合时，该方法能在不增加推理成本的情况下提升任何已训练模型的性能。大量实证评估表明，我们的方法在多个数据集、现代神经网络架构及训练协议中均具有显著效果。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日