Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data

Modern deep learning models are usually highly over-parameterized so that they can overfit the training data. Surprisingly, such overfitting neural networks can usually still achieve high prediction accuracy. To study this "benign overfitting" phenomenon, a line of recent works has theoretically studied the learning of linear models and two-layer neural networks. However, most of these analyses are still limited to the very simple learning problems where the Bayes-optimal classifier is linear. In this work, we investigate a class of XOR-type classification tasks with label-flipping noises. We show that, under a certain condition on the sample complexity and signal-to-noise ratio, an over-parameterized ReLU CNN trained by gradient descent can achieve near Bayes-optimal accuracy. Moreover, we also establish a matching lower bound result showing that when the previous condition is not satisfied, the prediction accuracy of the obtained CNN is an absolute constant away from the Bayes-optimal rate. Our result demonstrates that CNNs have a remarkable capacity to efficiently learn XOR problems, even in the presence of highly correlated features.

翻译：现代深度学习模型通常高度过参数化，从而能够过拟合训练数据。令人惊讶的是，这种过拟合神经网络通常仍能实现高预测精度。为研究这一"良性过拟合"现象，近期一系列工作从理论上探讨了线性模型及两层神经网络的学习问题。然而，大多数分析仍局限于贝叶斯最优分类器为线性的极简单学习问题。本文研究了一类带标签翻转噪声的异或型分类任务。我们证明，在满足特定样本复杂度与信噪比条件时，通过梯度下降训练的过参数化ReLU卷积神经网络可实现接近贝叶斯最优的准确率。此外，我们还建立了匹配的下界结果，表明当先前条件不满足时，所得CNN的预测精度与贝叶斯最优率之间存在绝对常数差距。本研究结果揭示了CNN在高效学习异或问题方面具有卓越能力，即便在特征高度相关的情况下亦能成立。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日