The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness

Recent empirical and theoretical studies have established the generalization capabilities of large machine learning models that are trained to (approximately or exactly) fit noisy data. In this work, we prove a surprising result that even if the ground truth itself is robust to adversarial examples, and the benignly overfitted model is benign in terms of the ``standard'' out-of-sample risk objective, this benign overfitting process can be harmful when out-of-sample data are subject to adversarial manipulation. More specifically, our main results contain two parts: (i) the min-norm estimator in overparameterized linear model always leads to adversarial vulnerability in the ``benign overfitting'' setting; (ii) we verify an asymptotic trade-off result between the standard risk and the ``adversarial'' risk of every ridge regression estimator, implying that under suitable conditions these two items cannot both be small at the same time by any single choice of the ridge regularization parameter. Furthermore, under the lazy training regime, we demonstrate parallel results on two-layer neural tangent kernel (NTK) model, which align with empirical observations in deep neural networks. Our finding provides theoretical insights into the puzzling phenomenon observed in practice, where the true target function (e.g., human) is robust against adverasrial attack, while beginly overfitted neural networks lead to models that are not robust.

翻译：近期实证与理论研究已证实，大型机器学习模型在（近似或精确）拟合含噪数据时具备泛化能力。本文证明了一个令人惊讶的结论：即便真实标签本身对抗性样本具有鲁棒性，且良性过拟合模型在"标准"样本外风险指标上表现良好，但当样本外数据遭受对抗性操纵时，这种良性过拟合过程可能产生危害。具体而言，我们的主要结果包含两部分：（i）过参数化线性模型中的最小范数估计量在"良性过拟合"设定下始终导致对抗脆弱性；（ii）我们验证了岭回归估计量标准风险与"对抗"风险之间的渐近权衡关系，表明在适当条件下，无法通过单一岭正则化参数的选择使两项指标同时保持较小值。此外，在惰性训练机制下，我们在两层神经正切核（NTK）模型上证实了与深度神经网络实证观察相一致的平行结论。本研究为实践中观察到的悖论现象提供了理论解释：真实目标函数（如人类判断）对对抗攻击具有鲁棒性，而良性过拟合的神经网络却导致模型丧失鲁棒性。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日