Deep neural networks have been known to be vulnerable to adversarial examples, which are inputs that are modified slightly to fool the network into making incorrect predictions. This has led to a significant amount of research on evaluating the robustness of these networks against such perturbations. One particularly important robustness metric is the robustness to minimal $\ell_2$ adversarial perturbations. However, existing methods for evaluating this robustness metric are either computationally expensive or not very accurate. In this paper, we introduce a new family of adversarial attacks that strike a balance between effectiveness and computational efficiency. Our proposed attacks are generalizations of the well-known DeepFool (DF) attack, while they remain simple to understand and implement. We demonstrate that our attacks outperform existing methods in terms of both effectiveness and computational efficiency. Our proposed attacks are also suitable for evaluating the robustness of large models and can be used to perform adversarial training (AT) to achieve state-of-the-art robustness to minimal $\ell_2$ adversarial perturbations.
翻译:深度神经网络已被证实容易受到对抗样本的攻击,这些样本是经过轻微修改后能够欺骗网络做出错误预测的输入。这引发了大量关于评估此类网络对抗扰动鲁棒性的研究。一个特别重要的鲁棒性度量指标是对最小$\ell_2$对抗扰动的鲁棒性。然而,现有评估该鲁棒性指标的方法要么计算成本高昂,要么准确性不足。本文提出了一类新的对抗攻击方法,在攻击效果与计算效率之间取得了平衡。我们所提出的攻击是著名DeepFool(DF)攻击的泛化形式,同时保持易于理解和实现的特点。实验证明,我们的攻击在效果和计算效率方面均优于现有方法。所提出的攻击同样适用于评估大型模型的鲁棒性,并可用于执行对抗训练(AT),从而实现对最小$\ell_2$对抗扰动的最先进鲁棒性。