Modern deep learning models with great expressive power can be trained to overfit the training data but still generalize well. This phenomenon is referred to as benign overfitting. Recently, a few studies have attempted to theoretically understand benign overfitting in neural networks. However, these works are either limited to neural networks with smooth activation functions or to the neural tangent kernel regime. How and when benign overfitting can occur in ReLU neural networks remains an open problem. In this work, we seek to answer this question by establishing algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise. We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk. Our result also reveals a sharp transition between benign and harmful overfitting under different conditions on data distribution in terms of test risk. Experiments on synthetic data back up our theory.
翻译:现代深度学习模型具有强大的表达能力,训练时可能对训练数据过拟合但仍能保持良好泛化性能,这种现象被称为有益过拟合。近期,少量研究尝试从理论角度理解神经网络中的有益过拟合,但这些工作或局限于使用光滑激活函数的神经网络,或局限于神经正切核机制。ReLU神经网络中何时以及如何出现有益过拟合仍是一个开放问题。本研究通过建立带有标签翻转噪声的双层ReLU卷积神经网络的学习算法依赖风险界,力求解答该问题。我们证明,在温和条件下,经梯度下降训练的神经网络可实现接近零的训练损失和贝叶斯最优测试风险。我们的结果也揭示了在数据分布的不同条件下,有益过拟合与有害过拟合之间在测试风险层面的尖锐转变。基于合成数据的实验验证了我们的理论。