The practical success of deep learning has led to the discovery of several surprising phenomena. One of these phenomena, that has spurred intense theoretical research, is ``benign overfitting'': deep neural networks seem to generalize well in the over-parametrized regime even though the networks show a perfect fit to noisy training data. It is now known that benign overfitting also occurs in various classical statistical models. For linear maximum margin classifiers, benign overfitting has been established theoretically in a class of mixture models with very strong assumptions on the covariate distribution. However, even in this simple setting, many questions remain open. For instance, most of the existing literature focuses on the noiseless case where all true class labels are observed without errors, whereas the more interesting noisy case remains poorly understood. We provide a comprehensive study of benign overfitting for linear maximum margin classifiers. We discover a phase transition in test error bounds for the noisy model which was previously unknown and provide some geometric intuition behind it. We further considerably relax the required covariate assumptions in both, the noisy and noiseless case. Our results demonstrate that benign overfitting of maximum margin classifiers holds in a much wider range of scenarios than was previously known and provide new insights into the underlying mechanisms.
翻译:深度学习的实际成功催生了若干令人惊讶的现象。其中引发强烈理论关注的现象之一是“良性过拟合”:深度神经网络在过参数化状态下似乎仍能保持良好泛化性能,尽管网络对含噪声的训练数据呈现完美拟合。目前已知,良性过拟合现象也存在于各类经典统计模型中。对于线性最大间隔分类器,已有研究在一类具有极强协变量分布假设的混合模型中从理论上证实了良性过拟合的存在。然而,即使在此简单设定下,许多问题仍未解决。例如,现有文献大多聚焦于无噪声情形(即所有真实类别标签均被无误差观测),而更具现实意义的含噪声情形仍缺乏深入理解。本文对线性最大间隔分类器的良性过拟合现象进行了系统性研究。我们在含噪声模型中发现了测试误差界的相变现象——这一现象此前未被认知,并提供了其背后的几何直观解释。此外,我们在含噪声与无噪声两种情形下均显著放宽了对协变量的假设要求。研究结果表明,最大间隔分类器的良性过拟合现象存在于比以往认知更广泛的场景中,并为理解其内在机制提供了新的视角。