The problem of benign overfitting asks whether it is possible for a model to perfectly fit noisy training data and still generalize well. We study benign overfitting in two-layer leaky ReLU networks trained with the hinge loss on a binary classification task. We consider input data that can be decomposed into the sum of a common signal and a random noise component, that lie on subspaces orthogonal to one another. We characterize conditions on the signal to noise ratio (SNR) of the model parameters giving rise to benign versus non-benign (or harmful) overfitting: in particular, if the SNR is high then benign overfitting occurs, conversely if the SNR is low then harmful overfitting occurs. We attribute both benign and non-benign overfitting to an approximate margin maximization property and show that leaky ReLU networks trained on hinge loss with gradient descent (GD) satisfy this property. In contrast to prior work we do not require the training data to be nearly orthogonal. Notably, for input dimension $d$ and training sample size $n$, while results in prior work require $d = \Omega(n^2 \log n)$, here we require only $d = \Omega\left(n\right)$.
翻译:良性过拟合问题探讨模型在完美拟合含噪声训练数据的同时是否仍能保持良好的泛化能力。本文研究在二元分类任务上使用铰链损失训练的两层Leaky ReLU网络中的良性过拟合现象。我们考虑可分解为公共信号分量与随机噪声分量之和的输入数据,且这两个分量位于相互正交的子空间上。我们刻画了模型参数信噪比(SNR)条件如何导致良性或非良性(有害)过拟合:具体而言,若信噪比较高则出现良性过拟合,反之若信噪比较低则出现有害过拟合。我们将良性与非良性过拟合归因于近似间隔最大化特性,并证明采用梯度下降(GD)和铰链损失训练的Leaky ReLU网络满足该特性。与已有研究不同,我们不需要训练数据近似正交。值得注意的是,对于输入维度$d$和训练样本量$n$,已有研究要求$d = \Omega(n^2 \log n)$,而本文仅需$d = \Omega\left(n\right)$。