Adversarial purification using generative models demonstrates strong adversarial defense performance. These methods are classifier and attack-agnostic, making them versatile but often computationally intensive. Recent strides in diffusion and score networks have improved image generation and, by extension, adversarial purification. Another highly efficient class of adversarial defense methods known as adversarial training requires specific knowledge of attack vectors, forcing them to be trained extensively on adversarial examples. To overcome these limitations, we introduce a new framework, namely Language Guided Adversarial Purification (LGAP), utilizing pre-trained diffusion models and caption generators to defend against adversarial attacks. Given an input image, our method first generates a caption, which is then used to guide the adversarial purification process through a diffusion network. Our approach has been evaluated against strong adversarial attacks, proving its effectiveness in enhancing adversarial robustness. Our results indicate that LGAP outperforms most existing adversarial defense techniques without requiring specialized network training. This underscores the generalizability of models trained on large datasets, highlighting a promising direction for further research.
翻译:使用生成模型进行对抗净化展现了强大的对抗防御性能。这类方法与分类器和攻击类型无关,使其具有通用性,但通常计算成本较高。扩散网络和分数网络的最新进展改善了图像生成,进而也提升了对抗净化的效果。另一类高效的对抗防御方法——对抗训练,需要特定攻击向量的知识,迫使其在对抗样本上进行大量训练。为克服这些局限,我们提出了一种新框架,即语言引导的对抗净化(LGAP),利用预训练的扩散模型和标题生成器来防御对抗攻击。给定输入图像,我们的方法首先生成标题,随后通过扩散网络利用该标题引导对抗净化过程。我们的方法已在强对抗攻击下进行了评估,证明了其在增强对抗鲁棒性方面的有效性。结果表明,LGAP在无需专门网络训练的情况下,优于大多数现有对抗防御技术。这凸显了在大规模数据集上训练的模型的泛化能力,为未来研究指明了一个有前景的方向。