Benign overfitting refers to how over-parameterized neural networks can fit training data perfectly and generalize well to unseen data. While this has been widely investigated theoretically, existing works are limited to two-layer networks with fixed output layers, where only the hidden weights are trained. We extend the analysis to two-layer ReLU convolutional neural networks (CNNs) with fully trainable layers, which is closer to the practice. Our results show that the initialization scaling of the output layer is crucial to the training dynamics: large scales make the model training behave similarly to that with the fixed output, the hidden layer grows rapidly while the output layer remains largely unchanged; in contrast, small scales result in more complex layer interactions, the hidden layer initially grows to a specific ratio relative to the output layer, after which both layers jointly grow and maintain that ratio throughout training. Furthermore, in both settings, we provide nearly matching upper and lower bounds on the test errors, identifying the sharp conditions on the initialization scaling and signal-to-noise ratio (SNR) in which the benign overfitting can be achieved or not. Numerical experiments back up the theoretical results.
翻译:良性过拟合指过参数化神经网络能够完美拟合训练数据,同时对新数据保持良好的泛化能力。尽管这一现象已在理论上得到广泛研究,但现有工作仅限于输出层固定的双层网络,其中仅隐藏层权重可训练。本文将分析拓展至全可训练层的双层ReLU卷积神经网络(CNN),这更接近实际应用场景。我们的结果表明,输出层的初始化缩放对训练动态至关重要:较大的缩放使模型训练行为与固定输出层的情况相似,隐藏层快速生长而输出层基本保持不变;相反,较小的缩放会导致更复杂的层间交互,隐藏层首先增长至相对于输出层的特定比例,随后两层共同增长并在整个训练过程中维持该比例。此外,在这两种设定下,我们给出了测试误差近乎匹配的上界与下界,明确了初始化缩放和信噪比(SNR)的严格条件,在这些条件下可以实现或无法实现良性过拟合。数值实验验证了理论结果。