Overparameterized neural networks (NNs) are observed to generalize well even when trained to perfectly fit noisy data. This phenomenon motivated a large body of work on "benign overfitting", where interpolating predictors achieve near-optimal performance. Recently, it was conjectured and empirically observed that the behavior of NNs is often better described as "tempered overfitting", where the performance is non-optimal yet also non-trivial, and degrades as a function of the noise level. However, a theoretical justification of this claim for non-linear NNs has been lacking so far. In this work, we provide several results that aim at bridging these complementing views. We study a simple classification setting with 2-layer ReLU NNs, and prove that under various assumptions, the type of overfitting transitions from tempered in the extreme case of one-dimensional data, to benign in high dimensions. Thus, we show that the input dimension has a crucial role on the type of overfitting in this setting, which we also validate empirically for intermediate dimensions. Overall, our results shed light on the intricate connections between the dimension, sample size, architecture and training algorithm on the one hand, and the type of resulting overfitting on the other hand.
翻译:过参数化的神经网络(NN)即使经过训练完美拟合含噪数据,也常能表现出良好的泛化能力。这一现象催生了大量关于"良性过度拟合"的研究,其中插值预测器能达到近乎最优的性能。近期,有研究者推测并通过实验观察到,神经网络的行为更常被描述为"适度过度拟合"——其性能虽非最优但也非无意义,且会随噪声水平升高而退化。然而,目前尚缺乏针对非线性神经网络的严格理论论证。本研究旨在弥合这两种互补视角,提供了多项理论结果。我们以两层ReLU神经网络为对象研究简单分类问题,证明在多种假设条件下,过度拟合类型会从极端低维(一维数据)情境下的适度过度拟合,转变为高维情境下的良性过度拟合。研究揭示了输入维度在该设置中对过度拟合类型的关键作用,并通过中等维度的实证验证加以佐证。总体而言,我们的研究阐明了维度、样本量、网络架构与训练算法等因素与最终过度拟合类型之间错综复杂的关联。