Deep learning models have achieved state-of-the-art performances in various domains, while they are vulnerable to the inputs with well-crafted but small perturbations, which are named after adversarial examples (AEs). Among many strategies to improve the model robustness against AEs, Projected Gradient Descent (PGD) based adversarial training is one of the most effective methods. Unfortunately, the prohibitive computational overhead of generating strong enough AEs, due to the maximization of the loss function, sometimes makes the regular PGD adversarial training impractical when using larger and more complicated models. In this paper, we propose that the adversarial loss can be approximated by the partial sum of Taylor series. Furthermore, we approximate the gradient of adversarial loss and propose a new and efficient adversarial training method, adversarial training with gradient approximation (GAAT), to reduce the cost of building up robust models. Additionally, extensive experiments demonstrate that this efficiency improvement can be achieved without any or with very little loss in accuracy on natural and adversarial examples, which show that our proposed method saves up to 60\% of the training time with comparable model test accuracy on MNIST, CIFAR-10 and CIFAR-100 datasets.
翻译:深度学习模型在各个领域取得了最先进的性能,但它们容易受到精心设计但扰动微小(称为对抗样本)的输入影响。在众多提升模型对对抗样本鲁棒性的策略中,基于投影梯度下降(PGD)的对抗训练是最有效的方法之一。然而,由于损失函数最大化导致生成足够强的对抗样本需要极高的计算开销,这使得常规PGD对抗训练在使用更大、更复杂的模型时往往不可行。本文提出对抗损失可通过泰勒级数的部分和来近似。进一步地,我们逼近了对抗损失的梯度,并提出一种新颖高效的对抗训练方法——基于梯度逼近的对抗训练(GAAT),以降低构建鲁棒模型的成本。此外,大量实验表明,这种效率提升可以在自然样本和对抗样本上几乎不损失或完全不损失准确率的情况下实现。在MNIST、CIFAR-10和CIFAR-100数据集上,我们提出的方法在保持相当模型测试准确率的同时,最多可节省60%的训练时间。