Universality of Benign Overfitting in Binary Linear Classification

The practical success of deep learning has led to the discovery of several surprising phenomena. One of these phenomena, that has spurred intense theoretical research, is ``benign overfitting'': deep neural networks seem to generalize well in the over-parametrized regime even though the networks show a perfect fit to noisy training data. It is now known that benign overfitting also occurs in various classical statistical models. For linear maximum margin classifiers, benign overfitting has been established theoretically in a class of mixture models with very strong assumptions on the covariate distribution. However, even in this simple setting, many questions remain open. For instance, most of the existing literature focuses on the noiseless case where all true class labels are observed without errors, whereas the more interesting noisy case remains poorly understood. We provide a comprehensive study of benign overfitting for linear maximum margin classifiers. We discover a phase transition in test error bounds for the noisy model which was previously unknown and provide some geometric intuition behind it. We further considerably relax the required covariate assumptions in both, the noisy and noiseless case. Our results demonstrate that benign overfitting of maximum margin classifiers holds in a much wider range of scenarios than was previously known and provide new insights into the underlying mechanisms.

翻译：深度学习的实际成功催生了若干令人惊讶的现象。其中引发强烈理论关注的现象之一是“良性过拟合”：深度神经网络在过参数化状态下似乎仍能保持良好泛化性能，尽管网络对含噪声的训练数据呈现完美拟合。目前已知，良性过拟合现象也存在于各类经典统计模型中。对于线性最大间隔分类器，已有研究在一类具有极强协变量分布假设的混合模型中从理论上证实了良性过拟合的存在。然而，即使在此简单设定下，许多问题仍未解决。例如，现有文献大多聚焦于无噪声情形（即所有真实类别标签均被无误差观测），而更具现实意义的含噪声情形仍缺乏深入理解。本文对线性最大间隔分类器的良性过拟合现象进行了系统性研究。我们在含噪声模型中发现了测试误差界的相变现象——这一现象此前未被认知，并提供了其背后的几何直观解释。此外，我们在含噪声与无噪声两种情形下均显著放宽了对协变量的假设要求。研究结果表明，最大间隔分类器的良性过拟合现象存在于比以往认知更广泛的场景中，并为理解其内在机制提供了新的视角。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日