Adversarial training (AT) is a canonical method for enhancing the robustness of deep neural networks (DNNs). However, recent studies empirically demonstrated that it suffers from robust overfitting, i.e., a long time AT can be detrimental to the robustness of DNNs. This paper presents a theoretical explanation of robust overfitting for DNNs. Specifically, we non-trivially extend the neural tangent kernel (NTK) theory to AT and prove that an adversarially trained wide DNN can be well approximated by a linearized DNN. Moreover, for squared loss, closed-form AT dynamics for the linearized DNN can be derived, which reveals a new AT degeneration phenomenon: a long-term AT will result in a wide DNN degenerates to that obtained without AT and thus cause robust overfitting. Based on our theoretical results, we further design a method namely Adv-NTK, the first AT algorithm for infinite-width DNNs. Experiments on real-world datasets show that Adv-NTK can help infinite-width DNNs enhance comparable robustness to that of their finite-width counterparts, which in turn justifies our theoretical findings. The code is available at https://github.com/fshp971/adv-ntk.
翻译:对抗训练(AT)是增强深度神经网络鲁棒性的经典方法。然而,近期研究通过实验表明其存在鲁棒过拟合问题,即长时间对抗训练可能损害深度神经网络的鲁棒性。本文为深度神经网络的鲁棒过拟合现象提供了理论解释。具体而言,我们将神经正切核(NTK)理论非平凡地扩展至对抗训练,并证明经对抗训练的宽深度网络可被线性化网络良好近似。此外,对于平方损失函数,可推导出线性化网络的闭式对抗训练动力学,该动力学揭示了新的对抗训练退化现象:长期对抗训练将导致宽深度网络退化为未经过对抗训练的网络,从而引发鲁棒过拟合。基于理论结果,我们进一步设计了Adv-NTK方法——首个面向无穷宽度深度神经网络的对抗训练算法。在真实数据集上的实验表明,Adv-NTK能帮助无穷宽度网络获得与有限宽度网络相当的鲁棒性,从而佐证了我们的理论发现。代码详见https://github.com/fshp971/adv-ntk。