United We Stand: Using Epoch-wise Agreement of Ensembles to Combat Overfit

Deep neural networks have become the method of choice for solving many classification tasks, largely because they can fit very complex functions defined over raw data. The downside of such powerful learners is the danger of overfit. In this paper, we introduce a novel ensemble classifier for deep networks that effectively overcomes overfitting by combining models generated at specific intermediate epochs during training. Our method allows for the incorporation of useful knowledge obtained by the models during the overfitting phase without deterioration of the general performance, which is usually missed when early stopping is used. To motivate this approach, we begin with the theoretical analysis of a regression model, whose prediction -- that the variance among classifiers increases when overfit occurs -- is demonstrated empirically in deep networks in common use. Guided by these results, we construct a new ensemble-based prediction method, where the prediction is determined by the class that attains the most consensual prediction throughout the training epochs. Using multiple image and text classification datasets, we show that when regular ensembles suffer from overfit, our method eliminates the harmful reduction in generalization due to overfit, and often even surpasses the performance obtained by early stopping. Our method is easy to implement and can be integrated with any training scheme and architecture, without additional prior knowledge beyond the training set. It is thus a practical and useful tool to overcome overfit. Code is available at https://github.com/uristern123/United-We-Stand-Using-Epoch-wise-Agreement-of-Ensembles-to-Combat-Overfit.

翻译：深度神经网络已成为解决许多分类任务的首选方法，这主要归功于其能够拟合原始数据上定义的复杂函数。然而，这种强大学习器的弊端在于存在过拟合风险。本文提出了一种针对深度网络的新型集成分类器，通过组合训练过程中特定中间轮次生成的模型，有效克服了过拟合问题。我们的方法能够纳入模型在过拟合阶段获得的有用知识，同时避免整体性能下降——而这正是早停法通常会丧失的。为论证该方法，我们首先对回归模型进行理论分析，其预测结论——即过拟合发生时分类器间方差增大——在常用深度网络中得到实证验证。基于这些结果，我们构建了一种基于集成的新型预测方法，其预测结果由训练轮次中获得最多共识的类别决定。通过在多个图像和文本分类数据集上的实验表明，当常规集成模型遭遇过拟合时，我们的方法能够消除过拟合导致的泛化性能损失，甚至经常超越早停法所获得的性能。该方法易于实现，可无缝集成至任何训练方案与架构中，除训练集外无需额外先验知识，因此是一种实用且有效的过拟合对抗工具。代码已开源至 https://github.com/uristern123/United-We-Stand-Using-Epoch-wise-Agreement-of-Ensembles-to-Combat-Overfit。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日