AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples

Adversarial examples are typically optimized with gradient-based attacks. While novel attacks are continuously proposed, each is shown to outperform its predecessors using different experimental setups, hyperparameter settings, and number of forward and backward calls to the target models. This provides overly-optimistic and even biased evaluations that may unfairly favor one particular attack over the others. In this work, we aim to overcome these limitations by proposing AttackBench, i.e., the first evaluation framework that enables a fair comparison among different attacks. To this end, we first propose a categorization of gradient-based attacks, identifying their main components and differences. We then introduce our framework, which evaluates their effectiveness and efficiency. We measure these characteristics by (i) defining an optimality metric that quantifies how close an attack is to the optimal solution, and (ii) limiting the number of forward and backward queries to the model, such that all attacks are compared within a given maximum query budget. Our extensive experimental analysis compares more than $100$ attack implementations with a total of over $800$ different configurations against CIFAR-10 and ImageNet models, highlighting that only very few attacks outperform all the competing approaches. Within this analysis, we shed light on several implementation issues that prevent many attacks from finding better solutions or running at all. We release AttackBench as a publicly-available benchmark, aiming to continuously update it to include and evaluate novel gradient-based attacks for optimizing adversarial examples.

翻译：对抗样本通常通过基于梯度的攻击方法进行优化。尽管新颖的攻击方法不断被提出，但每种方法均使用不同的实验设置、超参数配置以及对目标模型的前向与反向调用次数来证明其优于先前方法。这导致了过于乐观甚至存在偏见的评估结果，可能不公平地偏向某一特定攻击方法。在本工作中，我们旨在通过提出AttackBench（首个实现不同攻击方法公平比较的评估框架）来克服这些局限性。为此，我们首先对基于梯度的攻击方法进行分类，识别其主要组件与差异。随后引入我们的评估框架，从攻击效能与效率两方面进行评估。我们通过以下方式量化这些特性：（i）定义最优性度量指标，量化攻击结果与最优解的接近程度；（ii）限制对模型的前向与反向查询次数，确保所有攻击在给定的最大查询预算内进行比较。我们通过大量实验分析，在CIFAR-10和ImageNet模型上对比了超过100种攻击实现（总计超800种不同配置），结果表明仅有极少数攻击方法能全面优于所有竞争方法。在此分析过程中，我们揭示了若干阻碍攻击方法寻找更优解或正常运行的实现问题。我们将AttackBench作为公开可用的基准测试平台发布，旨在持续更新以纳入并评估用于优化对抗样本的新型梯度攻击方法。