The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness

Recent empirical and theoretical studies have established the generalization capabilities of large machine learning models that are trained to (approximately or exactly) fit noisy data. In this work, we prove a surprising result that even if the ground truth itself is robust to adversarial examples, and the benignly overfitted model is benign in terms of the ``standard'' out-of-sample risk objective, this benign overfitting process can be harmful when out-of-sample data are subject to adversarial manipulation. More specifically, our main results contain two parts: (i) the min-norm estimator in overparameterized linear model always leads to adversarial vulnerability in the ``benign overfitting'' setting; (ii) we verify an asymptotic trade-off result between the standard risk and the ``adversarial'' risk of every ridge regression estimator, implying that under suitable conditions these two items cannot both be small at the same time by any single choice of the ridge regularization parameter. Furthermore, under the lazy training regime, we demonstrate parallel results on two-layer neural tangent kernel (NTK) model, which align with empirical observations in deep neural networks. Our finding provides theoretical insights into the puzzling phenomenon observed in practice, where the true target function (e.g., human) is robust against adverasrial attack, while beginly overfitted neural networks lead to models that are not robust.

翻译：近期实证与理论研究已证实，大规模机器学习模型在（近似或精确）拟合含噪数据时具备泛化能力。本研究发现了一个令人意外的结果：即使真实标签本身对对抗样本具有鲁棒性，且良性过拟合模型在"标准"样本外风险指标上表现良好，该良性过拟合过程仍可能在样本外数据遭受对抗性操控时产生危害。具体而言，我们的主要结论包含两部分：（i）过参数化线性模型中的最小范数估计量始终导致"良性过拟合"场景下的对抗脆弱性；（ii）我们验证了岭回归估计量标准风险与"对抗性"风险之间的渐进权衡关系，表明在合适条件下，任何单一岭正则化参数选择均无法使这两项指标同时达到较小值。此外，在懒训练机制下，我们展示了双层神经切线核（NTK）模型中的平行结论，该结论与深度神经网络的实证观察高度吻合。我们的研究为实践中观察到的悖论现象提供了理论洞见：真实目标函数（例如人类）对对抗攻击具有鲁棒性，而良性过拟合神经网络却会产生非鲁棒模型。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日