Deep learning models have achieved state-of-the-art performances in various domains, while they are vulnerable to the inputs with well-crafted but small perturbations, which are named after adversarial examples (AEs). Among many strategies to improve the model robustness against AEs, Projected Gradient Descent (PGD) based adversarial training is one of the most effective methods. Unfortunately, the prohibitive computational overhead of generating strong enough AEs, due to the maximization of the loss function, sometimes makes the regular PGD adversarial training impractical when using larger and more complicated models. In this paper, we propose that the adversarial loss can be approximated by the partial sum of Taylor series. Furthermore, we approximate the gradient of adversarial loss and propose a new and efficient adversarial training method, adversarial training with gradient approximation (GAAT), to reduce the cost of building up robust models. Additionally, extensive experiments demonstrate that this efficiency improvement can be achieved without any or with very little loss in accuracy on natural and adversarial examples, which show that our proposed method saves up to 60\% of the training time with comparable model test accuracy on MNIST, CIFAR-10 and CIFAR-100 datasets.
翻译:深度学习模型在多个领域取得了最先进的性能,但其容易受到精心设计的小扰动输入(即对抗样本)的攻击。在众多提升模型对对抗样本鲁棒性的策略中,基于投影梯度下降的对抗训练是最有效的方法之一。然而,由于生成足够强的对抗样本需要最大化损失函数,导致计算开销过大,使得常规的投影梯度下降对抗训练在使用更大更复杂的模型时变得不切实际。本文提出对抗损失可以通过泰勒级数的部分和进行逼近。进一步,我们通过逼近对抗损失的梯度,提出了一种新的高效对抗训练方法——基于梯度逼近的对抗训练(GAAT),以降低构建鲁棒模型的成本。大量实验表明,这种效率提升可以在不损失或极少损失自然样本和对抗样本准确率的情况下实现。在MNIST、CIFAR-10和CIFAR-100数据集上,我们提出的方法在保持测试准确率相当的前提下,最高可节省60%的训练时间。