RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses

Although adversarial robustness has been extensively studied in white-box settings, recent advances in black-box attacks (including transfer- and query-based approaches) are primarily benchmarked against weak defenses, leaving a significant gap in the evaluation of their effectiveness against more recent and moderate robust models (e.g., those featured in the Robustbench leaderboard). In this paper, we question this lack of attention from black-box attacks to robust models. We establish a framework to evaluate the effectiveness of recent black-box attacks against both top-performing and standard defense mechanisms, on the ImageNet dataset. Our empirical evaluation reveals the following key findings: (1) the most advanced black-box attacks struggle to succeed even against simple adversarially trained models; (2) robust models that are optimized to withstand strong white-box attacks, such as AutoAttack, also exhibits enhanced resilience against black-box attacks; and (3) robustness alignment between the surrogate models and the target model plays a key factor in the success rate of transfer-based attacks

翻译：尽管对抗鲁棒性在白盒设置下已得到广泛研究，但近期黑盒攻击（包括基于迁移和基于查询的方法）的进展主要针对弱防御进行基准测试，导致其在评估对抗较新且具有中等鲁棒性模型（例如鲁棒性基准排行榜中的模型）的有效性方面存在显著差距。本文质疑黑盒攻击对鲁棒模型缺乏关注的问题。我们建立了一个框架，用于评估近期黑盒攻击在ImageNet数据集上对抗顶级性能与标准防御机制的有效性。实证评估揭示了以下关键发现：（1）即使针对简单的对抗训练模型，最先进的黑盒攻击也难以成功；（2）为抵御强白盒攻击（如AutoAttack）而优化的鲁棒模型，同样展现出对黑盒攻击的增强抵抗力；（3）代理模型与目标模型之间的鲁棒性对齐是影响基于迁移攻击成功率的关键因素。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

【博士论文】对抗鲁棒性深度学习算法

专知会员服务

16+阅读 · 2025年9月29日

【牛津大学博士论文】抗规避攻击鲁棒学习的样本复杂度

专知会员服务

24+阅读 · 2023年8月29日

《网络防御中深度学习方法的鲁棒性和脆弱性测量》72页论文

专知会员服务

41+阅读 · 2023年4月20日

【CVPR2023】基于强化学习的黑盒模型反演攻击

专知会员服务

24+阅读 · 2023年4月12日