Although adversarial robustness has been extensively studied in white-box settings, recent advances in black-box attacks (including transfer- and query-based approaches) are primarily benchmarked against weak defenses, leaving a significant gap in the evaluation of their effectiveness against more recent and moderate robust models (e.g., those featured in the Robustbench leaderboard). In this paper, we question this lack of attention from black-box attacks to robust models. We establish a framework to evaluate the effectiveness of recent black-box attacks against both top-performing and standard defense mechanisms, on the ImageNet dataset. Our empirical evaluation reveals the following key findings: (1) the most advanced black-box attacks struggle to succeed even against simple adversarially trained models; (2) robust models that are optimized to withstand strong white-box attacks, such as AutoAttack, also exhibits enhanced resilience against black-box attacks; and (3) robustness alignment between the surrogate models and the target model plays a key factor in the success rate of transfer-based attacks
翻译:尽管对抗鲁棒性在白盒设置下已得到广泛研究,但近期黑盒攻击(包括基于迁移和基于查询的方法)的进展主要针对弱防御进行基准测试,导致其在评估对抗较新且具有中等鲁棒性模型(例如鲁棒性基准排行榜中的模型)的有效性方面存在显著差距。本文质疑黑盒攻击对鲁棒模型缺乏关注的问题。我们建立了一个框架,用于评估近期黑盒攻击在ImageNet数据集上对抗顶级性能与标准防御机制的有效性。实证评估揭示了以下关键发现:(1)即使针对简单的对抗训练模型,最先进的黑盒攻击也难以成功;(2)为抵御强白盒攻击(如AutoAttack)而优化的鲁棒模型,同样展现出对黑盒攻击的增强抵抗力;(3)代理模型与目标模型之间的鲁棒性对齐是影响基于迁移攻击成功率的关键因素。