Benign overfitting in leaky ReLU networks with moderate input dimension

The problem of benign overfitting asks whether it is possible for a model to perfectly fit noisy training data and still generalize well. We study benign overfitting in two-layer leaky ReLU networks trained with the hinge loss on a binary classification task. We consider input data that can be decomposed into the sum of a common signal and a random noise component, that lie on subspaces orthogonal to one another. We characterize conditions on the signal to noise ratio (SNR) of the model parameters giving rise to benign versus non-benign (or harmful) overfitting: in particular, if the SNR is high then benign overfitting occurs, conversely if the SNR is low then harmful overfitting occurs. We attribute both benign and non-benign overfitting to an approximate margin maximization property and show that leaky ReLU networks trained on hinge loss with gradient descent (GD) satisfy this property. In contrast to prior work we do not require the training data to be nearly orthogonal. Notably, for input dimension $d$ and training sample size $n$, while results in prior work require $d = \Omega(n^2 \log n)$, here we require only $d = \Omega\left(n\right)$.

翻译：良性过拟合问题探讨模型在完美拟合含噪声训练数据的同时是否仍能保持良好的泛化能力。本文研究在二元分类任务上使用铰链损失训练的两层Leaky ReLU网络中的良性过拟合现象。我们考虑可分解为公共信号分量与随机噪声分量之和的输入数据，且这两个分量位于相互正交的子空间上。我们刻画了模型参数信噪比（SNR）条件如何导致良性或非良性（有害）过拟合：具体而言，若信噪比较高则出现良性过拟合，反之若信噪比较低则出现有害过拟合。我们将良性与非良性过拟合归因于近似间隔最大化特性，并证明采用梯度下降（GD）和铰链损失训练的Leaky ReLU网络满足该特性。与已有研究不同，我们不需要训练数据近似正交。值得注意的是，对于输入维度$d$和训练样本量$n$，已有研究要求$d = \Omega(n^2 \log n)$，而本文仅需$d = \Omega\left(n\right)$。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日