Modern deep learning models are usually highly over-parameterized so that they can overfit the training data. Surprisingly, such overfitting neural networks can usually still achieve high prediction accuracy. To study this "benign overfitting" phenomenon, a line of recent works has theoretically studied the learning of linear models and two-layer neural networks. However, most of these analyses are still limited to the very simple learning problems where the Bayes-optimal classifier is linear. In this work, we investigate a class of XOR-type classification tasks with label-flipping noises. We show that, under a certain condition on the sample complexity and signal-to-noise ratio, an over-parameterized ReLU CNN trained by gradient descent can achieve near Bayes-optimal accuracy. Moreover, we also establish a matching lower bound result showing that when the previous condition is not satisfied, the prediction accuracy of the obtained CNN is an absolute constant away from the Bayes-optimal rate. Our result demonstrates that CNNs have a remarkable capacity to efficiently learn XOR problems, even in the presence of highly correlated features.
翻译:现代深度学习模型通常高度过参数化,从而能够过拟合训练数据。令人惊讶的是,这种过拟合神经网络通常仍能实现高预测精度。为研究这一"良性过拟合"现象,近期一系列工作从理论上探讨了线性模型及两层神经网络的学习问题。然而,大多数分析仍局限于贝叶斯最优分类器为线性的极简单学习问题。本文研究了一类带标签翻转噪声的异或型分类任务。我们证明,在满足特定样本复杂度与信噪比条件时,通过梯度下降训练的过参数化ReLU卷积神经网络可实现接近贝叶斯最优的准确率。此外,我们还建立了匹配的下界结果,表明当先前条件不满足时,所得CNN的预测精度与贝叶斯最优率之间存在绝对常数差距。本研究结果揭示了CNN在高效学习异或问题方面具有卓越能力,即便在特征高度相关的情况下亦能成立。