Most adversarial attacks and defenses focus on perturbations within small $\ell_p$-norm constraints. However, $\ell_p$ threat models cannot capture all relevant semantic-preserving perturbations, and hence, the scope of robustness evaluations is limited. In this work, we introduce Score-Based Adversarial Generation (ScoreAG), a novel framework that leverages the advancements in score-based generative models to generate adversarial examples beyond $\ell_p$-norm constraints, so-called unrestricted adversarial examples, overcoming their limitations. Unlike traditional methods, ScoreAG maintains the core semantics of images while generating realistic adversarial examples, either by transforming existing images or synthesizing new ones entirely from scratch. We further exploit the generative capability of ScoreAG to purify images, empirically enhancing the robustness of classifiers. Our extensive empirical evaluation demonstrates that ScoreAG matches the performance of state-of-the-art attacks and defenses across multiple benchmarks. This work highlights the importance of investigating adversarial examples bounded by semantics rather than $\ell_p$-norm constraints. ScoreAG represents an important step towards more encompassing robustness assessments.
翻译:大多数对抗性攻击和防御方法聚焦于小 $\ell_p$ 范数约束下的扰动。然而,$\ell_p$ 威胁模型无法捕获所有语义保持的扰动,因此鲁棒性评估的范围受到限制。本文提出基于评分的对抗生成(ScoreAG)框架,该框架利用基于评分的生成模型的最新进展,生成超越 $\ell_p$ 范数约束的对抗样本,即所谓的无限制对抗样本,从而克服了其局限性。与传统方法不同,ScoreAG在生成逼真对抗样本的同时保持图像的核心语义,既可通过变换现有图像,也可从头合成全新图像。我们进一步利用ScoreAG的生成能力对图像进行净化,从而在经验上增强分类器的鲁棒性。广泛的实证评估表明,ScoreAG在多个基准测试中达到了最先进攻击与防御方法的性能水平。本文强调了研究受语义而非 $\ell_p$ 范数约束的对抗样本的重要性。ScoreAG标志着朝向更全面的鲁棒性评估迈出的重要一步。