AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples

Adversarial examples are typically optimized with gradient-based attacks. While novel attacks are continuously proposed, each is shown to outperform its predecessors using different experimental setups, hyperparameter settings, and number of forward and backward calls to the target models. This provides overly-optimistic and even biased evaluations that may unfairly favor one particular attack over the others. In this work, we aim to overcome these limitations by proposing AttackBench, i.e., the first evaluation framework that enables a fair comparison among different attacks. To this end, we first propose a categorization of gradient-based attacks, identifying their main components and differences. We then introduce our framework, which evaluates their effectiveness and efficiency. We measure these characteristics by (i) defining an optimality metric that quantifies how close an attack is to the optimal solution, and (ii) limiting the number of forward and backward queries to the model, such that all attacks are compared within a given maximum query budget. Our extensive experimental analysis compares more than 100 attack implementations with a total of over 800 different configurations against CIFAR-10 and ImageNet models, highlighting that only very few attacks outperform all the competing approaches. Within this analysis, we shed light on several implementation issues that prevent many attacks from finding better solutions or running at all. We release AttackBench as a publicly available benchmark, aiming to continuously update it to include and evaluate novel gradient-based attacks for optimizing adversarial examples.

翻译：对抗样本通常通过梯度攻击进行优化。尽管新型攻击方法不断涌现，但每种方法都采用不同的实验设置、超参数配置以及对目标模型的正向和反向调用次数来证明其优于先前方法。这种做法可能导致过度乐观甚至存在偏见的评估结果，使特定攻击方法获得不公平的优势。为克服这些局限，本文提出AttackBench——首个实现不同攻击方法公平比较的评估框架。我们首先对梯度攻击进行分类，明确其核心组件与差异；随后引入该框架用于评估攻击的有效性与效率。通过（i）定义量化攻击结果与最优解接近程度的最优性指标，以及（ii）限制对模型的正向和反向查询次数（确保所有攻击在给定最大查询预算内可比），实现上述评估目标。我们在CIFAR-10和ImageNet模型上开展大规模实验分析，涵盖超过100种攻击实现的800余种不同配置，结果表明仅极少数攻击能够全面超越竞争对手。本分析还揭示了诸多实施层面的问题（例如代码缺陷或参数配置不当），导致多数攻击无法找到更优解甚至完全无法运行。AttackBench已作为公开基准发布，我们将持续更新以纳入并评估更多新型梯度攻击方法。