Adversarial training (AT) is a canonical method for enhancing the robustness of deep neural networks (DNNs). However, recent studies empirically demonstrated that it suffers from robust overfitting, i.e., a long time AT can be detrimental to the robustness of DNNs. This paper presents a theoretical explanation of robust overfitting for DNNs. Specifically, we non-trivially extend the neural tangent kernel (NTK) theory to AT and prove that an adversarially trained wide DNN can be well approximated by a linearized DNN. Moreover, for squared loss, closed-form AT dynamics for the linearized DNN can be derived, which reveals a new AT degeneration phenomenon: a long-term AT will result in a wide DNN degenerates to that obtained without AT and thus cause robust overfitting. Based on our theoretical results, we further design a method namely Adv-NTK, the first AT algorithm for infinite-width DNNs. Experiments on real-world datasets show that Adv-NTK can help infinite-width DNNs enhance comparable robustness to that of their finite-width counterparts, which in turn justifies our theoretical findings. The code is available at https://github.com/fshp971/adv-ntk.
翻译:对抗训练(AT)是增强深度神经网络(DNNs)鲁棒性的经典方法。然而,近期研究通过实验证明该方法存在鲁棒过拟合问题,即长时间对抗训练反而会损害DNNs的鲁棒性。本文对DNNs的鲁棒过拟合现象提供了理论解释。具体而言,我们非平凡地将神经切线核(NTK)理论扩展到对抗训练,证明经过对抗训练的宽DNN可被线性化DNN良好逼近。此外,对于均方损失函数,可推导出线性化DNN的闭式对抗训练动力学方程,该方程揭示了一种新的对抗训练退化现象:长期对抗训练将导致宽DNN退化为未经过对抗训练的DNN,从而引发鲁棒过拟合。基于理论结果,我们进一步设计了Adv-NTK方法——首个针对无限宽DNN的对抗训练算法。真实数据集上的实验表明,Adv-NTK能帮助无限宽DNN获得与其有限宽对应模型相当的鲁棒性,这反过来验证了我们的理论发现。代码已开源在https://github.com/fshp971/adv-ntk。