The problem of benign overfitting asks whether it is possible for a model to perfectly fit noisy training data and still generalize well. We study benign overfitting in two-layer leaky ReLU networks trained with the hinge loss on a binary classification task. We consider input data which can be decomposed into the sum of a common signal and a random noise component, which lie on subspaces orthogonal to one another. We characterize conditions on the signal to noise ratio (SNR) of the model parameters giving rise to benign versus non-benign, or harmful, overfitting: in particular, if the SNR is high then benign overfitting occurs, conversely if the SNR is low then harmful overfitting occurs. We attribute both benign and non-benign overfitting to an approximate margin maximization property and show that leaky ReLU networks trained on hinge loss with Gradient Descent (GD) satisfy this property. In contrast to prior work we do not require near orthogonality conditions on the training data: notably, for input dimension $d$ and training sample size $n$, while prior work shows asymptotically optimal error when $d = \Omega(n^2 \log n)$, here we require only $d = \Omega\left(n \log \frac{1}{\epsilon}\right)$ to obtain error within $\epsilon$ of optimal.
翻译:良性过拟合问题探究模型是否可能在完美拟合含噪训练数据的同时仍具有良好的泛化性能。本文研究在二分类任务中使用铰链损失训练的两层泄漏ReLU网络中的良性过拟合现象。我们考虑输入数据可分解为公共信号与随机噪声分量之和,两者位于相互正交的子空间上。我们刻画了模型参数的信噪比条件,这些条件决定了良性过拟合与非良性(有害)过拟合的产生:具体而言,当信噪比高时出现良性过拟合,反之信噪比低时则出现有害过拟合。我们将良性及非良性过拟合归因于近似间隔最大化性质,并证明使用梯度下降优化铰链损失训练的泄漏ReLU网络满足该性质。与先前工作不同,我们无需对训练数据施加近似正交条件:值得注意的是,当输入维度为$d$、训练样本量为$n$时,前人工作证明当$d = \Omega(n^2 \log n)$时可实现渐近最优误差,而本文仅需$d = \Omega\left(n \log \frac{1}{\epsilon}\right)$即可获得与最优误差相差$\epsilon$的误差结果。