Benign overfitting in leaky ReLU networks with moderate input dimension

The problem of benign overfitting asks whether it is possible for a model to perfectly fit noisy training data and still generalize well. We study benign overfitting in two-layer leaky ReLU networks trained with the hinge loss on a binary classification task. We consider input data which can be decomposed into the sum of a common signal and a random noise component, which lie on subspaces orthogonal to one another. We characterize conditions on the signal to noise ratio (SNR) of the model parameters giving rise to benign versus non-benign, or harmful, overfitting: in particular, if the SNR is high then benign overfitting occurs, conversely if the SNR is low then harmful overfitting occurs. We attribute both benign and non-benign overfitting to an approximate margin maximization property and show that leaky ReLU networks trained on hinge loss with Gradient Descent (GD) satisfy this property. In contrast to prior work we do not require near orthogonality conditions on the training data: notably, for input dimension $d$ and training sample size $n$, while prior work shows asymptotically optimal error when $d = \Omega(n^2 \log n)$, here we require only $d = \Omega\left(n \log \frac{1}{\epsilon}\right)$ to obtain error within $\epsilon$ of optimal.

翻译：良性过拟合问题探究模型是否可能在完美拟合含噪训练数据的同时仍具有良好的泛化性能。本文研究在二分类任务中使用铰链损失训练的两层泄漏ReLU网络中的良性过拟合现象。我们考虑输入数据可分解为公共信号与随机噪声分量之和，两者位于相互正交的子空间上。我们刻画了模型参数的信噪比条件，这些条件决定了良性过拟合与非良性（有害）过拟合的产生：具体而言，当信噪比高时出现良性过拟合，反之信噪比低时则出现有害过拟合。我们将良性及非良性过拟合归因于近似间隔最大化性质，并证明使用梯度下降优化铰链损失训练的泄漏ReLU网络满足该性质。与先前工作不同，我们无需对训练数据施加近似正交条件：值得注意的是，当输入维度为$d$、训练样本量为$n$时，前人工作证明当$d = \Omega(n^2 \log n)$时可实现渐近最优误差，而本文仅需$d = \Omega\left(n \log \frac{1}{\epsilon}\right)$即可获得与最优误差相差$\epsilon$的误差结果。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日