Investigating Stateful Defenses Against Black-Box Adversarial Examples

Defending machine-learning (ML) models against white-box adversarial attacks has proven to be extremely difficult. Instead, recent work has proposed stateful defenses in an attempt to defend against a more restricted black-box attacker. These defenses operate by tracking a history of incoming model queries, and rejecting those that are suspiciously similar. The current state-of-the-art stateful defense Blacklight was proposed at USENIX Security '22 and claims to prevent nearly 100% of attacks on both the CIFAR10 and ImageNet datasets. In this paper, we observe that an attacker can significantly reduce the accuracy of a Blacklight-protected classifier (e.g., from 82.2% to 6.4% on CIFAR10) by simply adjusting the parameters of an existing black-box attack. Motivated by this surprising observation, since existing attacks were evaluated by the Blacklight authors, we provide a systematization of stateful defenses to understand why existing stateful defense models fail. Finally, we propose a stronger evaluation strategy for stateful defenses comprised of adaptive score and hard-label based black-box attacks. We use these attacks to successfully reduce even reconfigured versions of Blacklight to as low as 0% robust accuracy.

翻译：保护机器学习模型免受白盒对抗攻击已被证明极为困难。为此，近期研究提出采用有状态防御机制，旨在应对更具限制性的黑盒攻击者。此类防御通过追踪传入模型的查询历史，并拒绝那些异常相似的查询来运作。当前最先进的有状态防御方法Blacklight于USENIX Security '22会议上提出，宣称能在CIFAR10和ImageNet数据集上阻止近100%的攻击。在本文中，我们发现攻击者仅需调整现有黑盒攻击的参数，即可显著降低经Blacklight保护的分类器的准确率（例如，在CIFAR10上从82.2%降至6.4%）。鉴于Blacklight作者已对现有攻击进行了评估，这一惊人发现促使我们对有状态防御进行系统化梳理，以理解现有有状态防御模型失效的原因。最终，我们提出一种由自适应评分和基于硬标签的黑盒攻击组成的有状态防御更强评估策略。通过运用这些攻击，我们成功将即使经过重新配置的Blacklight版本的鲁棒准确率降低至0%。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【CVPR 2022】可转移的稀疏对抗性攻击，Transferable Sparse Adversarial Attack

专知会员服务

15+阅读 · 2022年3月12日

近期必读的六篇AAAI 2021【对抗攻击（Adversarial Attack）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月17日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

47+阅读 · 2020年10月31日