Adversarial attacks have the potential to mislead deep neural network classifiers by introducing slight perturbations. Developing algorithms that can mitigate the effects of these attacks is crucial for ensuring the safe use of artificial intelligence. Recent studies have suggested that score-based diffusion models are effective in adversarial defenses. However, existing diffusion-based defenses rely on the sequential simulation of the reversed stochastic differential equations of diffusion models, which are computationally inefficient and yield suboptimal results. In this paper, we introduce a novel adversarial defense scheme named ScoreOpt, which optimizes adversarial samples at test-time, towards original clean data in the direction guided by score-based priors. We conduct comprehensive experiments on multiple datasets, including CIFAR10, CIFAR100 and ImageNet. Our experimental results demonstrate that our approach outperforms existing adversarial defenses in terms of both robustness performance and inference speed.
翻译:对抗攻击通过引入微小扰动来误导深度神经网络分类器。开发能够缓解这些攻击影响的算法对于确保人工智能的安全使用至关重要。近期研究表明,基于分数的扩散模型在对抗防御中效果显著。然而,现存的基于扩散的防御方法依赖于对扩散模型反向随机微分方程的序贯模拟,该方法计算效率低下且结果次优。本文提出一种名为ScoreOpt的新型对抗防御方案,该方案在测试阶段对对抗样本进行优化,使其沿基于分数先验引导的方向逼近原始干净数据。我们在包括CIFAR10、CIFAR100和ImageNet在内的多个数据集上开展了全面实验。实验结果表明,我们的方法在鲁棒性性能和推理速度两方面均优于现有对抗防御方法。