While adversarial training methods have resulted in significant improvements in the deep neural nets' robustness against norm-bounded adversarial perturbations, their generalization performance from training samples to test data has been shown to be considerably worse than standard empirical risk minimization methods. Several recent studies seek to connect the generalization behavior of adversarially trained classifiers to various gradient-based min-max optimization algorithms used for their training. In this work, we study the generalization performance of adversarial training methods using the algorithmic stability framework. Specifically, our goal is to compare the generalization performance of the vanilla adversarial training scheme fully optimizing the perturbations at every iteration vs. the free adversarial training simultaneously optimizing the norm-bounded perturbations and classifier parameters. Our proven generalization bounds indicate that the free adversarial training method could enjoy a lower generalization gap between training and test samples due to the simultaneous nature of its min-max optimization algorithm. We perform several numerical experiments to evaluate the generalization performance of vanilla, fast, and free adversarial training methods. Our empirical findings also show the improved generalization performance of the free adversarial training method and further demonstrate that the better generalization result could translate to greater robustness against black-box attack schemes. The code is available at https://github.com/Xiwei-Cheng/Stability_FreeAT.
翻译:尽管对抗训练方法在深度神经网络对范数有界对抗扰动的鲁棒性方面取得了显著改进,但其从训练样本到测试数据的泛化性能已被证明远逊于标准经验风险最小化方法。近期多项研究试图将通过对抗训练的分类器的泛化行为与其训练中使用的各类基于梯度的极小极大优化算法相关联。本文采用算法稳定性框架研究对抗训练方法的泛化性能。具体而言,我们旨在比较标准对抗训练方案(每次迭代完全优化扰动)与自由对抗训练(同时优化范数有界扰动和分类器参数)的泛化性能。我们证明的泛化界表明,自由对抗训练方法因其极小极大优化算法的同步特性,可享有更小的训练与测试样本间的泛化差距。我们通过多项数值实验评估了标准、快速和自由对抗训练方法的泛化性能。实验结论也验证了自由对抗训练方法更优的泛化性能,并进一步表明更好的泛化结果可转化为对黑盒攻击方案的更强鲁棒性。代码发布于 https://github.com/Xiwei-Cheng/Stability_FreeAT。