Deep neural networks are susceptible to adversarial attacks, which pose a significant threat to their security and reliability in real-world applications. The most notable adversarial attacks are transfer-based attacks, where an adversary crafts an adversarial example to fool one model, which can also fool other models. While previous research has made progress in improving the transferability of untargeted adversarial examples, the generation of targeted adversarial examples that can transfer between models remains a challenging task. In this work, we present a novel approach to generate transferable targeted adversarial examples by exploiting the vulnerability of deep neural networks to perturbations on high-frequency components of images. We observe that replacing the high-frequency component of an image with that of another image can mislead deep models, motivating us to craft perturbations containing high-frequency information to achieve targeted attacks. To this end, we propose a method called Low-Frequency Adversarial Attack (\name), which trains a conditional generator to generate targeted adversarial perturbations that are then added to the low-frequency component of the image. Extensive experiments on ImageNet demonstrate that our proposed approach significantly outperforms state-of-the-art methods, improving targeted attack success rates by a margin from 3.2\% to 15.5\%.
翻译:深度神经网络易受对抗攻击,这对现实应用中的安全性和可靠性构成了重大威胁。最显著的对抗攻击是基于迁移的攻击,即攻击者构建一个对抗样本以欺骗某一模型,该样本同时也能欺骗其他模型。尽管先前研究在提升非定向对抗样本的可迁移性方面取得了进展,但生成能在模型间迁移的定向对抗样本仍是一项具有挑战性的任务。本文通过利用深度神经网络对图像高频分量扰动的脆弱性,提出了一种生成可迁移定向对抗样本的新方法。我们观察到,将图像的高频分量替换为另一图像的高频分量可误导深度模型,这启发我们构建包含高频信息的扰动以实现定向攻击。为此,我们提出了一种名为低频对抗攻击(LFAA)的方法,该方法训练一个条件生成器来生成定向对抗扰动,随后将其叠加至图像的低频分量上。在ImageNet上进行的大量实验表明,我们提出的方法显著优于现有最先进方法,将定向攻击成功率提升了3.2%至15.5%。