Adversarial attacks have the potential to mislead deep neural network classifiers by introducing slight perturbations. Developing algorithms that can mitigate the effects of these attacks is crucial for ensuring the safe use of artificial intelligence. Recent studies have suggested that score-based diffusion models are effective in adversarial defenses. However, existing diffusion-based defenses rely on the sequential simulation of the reversed stochastic differential equations of diffusion models, which are computationally inefficient and yield suboptimal results. In this paper, we introduce a novel adversarial defense scheme named ScoreOpt, which optimizes adversarial samples at test-time, towards original clean data in the direction guided by score-based priors. We conduct comprehensive experiments on multiple datasets, including CIFAR10, CIFAR100 and ImageNet. Our experimental results demonstrate that our approach outperforms existing adversarial defenses in terms of both robustness performance and inference speed.
翻译:对抗攻击通过引入微小的扰动,有可能误导深度神经网络分类器。开发能够减轻这些攻击影响的算法,对于确保人工智能的安全使用至关重要。近期的研究表明,基于分数的扩散模型在对抗防御中效果显著。然而,现有的基于扩散的防御方法依赖于对扩散模型逆向随机微分方程的序列模拟,这种方法计算效率低下且结果次优。在本文中,我们提出了一种名为ScoreOpt的新型对抗防御方案,该方案在测试时根据基于分数的先验方向,将对抗样本向原始干净数据优化。我们在CIFAR10、CIFAR100和ImageNet等多个数据集上进行了全面实验。实验结果表明,我们的方法在鲁棒性能和推理速度方面均优于现有的对抗防御方法。