Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence

Black-box adversarial attacks have shown strong potential to subvert machine learning models. Existing black-box attacks craft adversarial examples by iteratively querying the target model and/or leveraging the transferability of a local surrogate model. Recently, such attacks can be effectively mitigated by state-of-the-art (SOTA) defenses, e.g., detection via the pattern of sequential queries, or injecting noise into the model. To our best knowledge, we take the first step to study a new paradigm of black-box attacks with provable guarantees -- certifiable black-box attacks that can guarantee the attack success probability (ASP) of adversarial examples before querying over the target model. This new black-box attack unveils significant vulnerabilities of machine learning models, compared to traditional empirical black-box attacks, e.g., breaking strong SOTA defenses with provable confidence, constructing a space of (infinite) adversarial examples with high ASP, and the ASP of the generated adversarial examples is theoretically guaranteed without verification/queries over the target model. Specifically, we establish a novel theoretical foundation for ensuring the ASP of the black-box attack with randomized adversarial examples (AEs). Then, we propose several novel techniques to craft the randomized AEs while reducing the perturbation size for better imperceptibility. Finally, we have comprehensively evaluated the certifiable black-box attacks on the CIFAR10/100, ImageNet, and LibriSpeech datasets, while benchmarking with 16 SOTA empirical black-box attacks, against various SOTA defenses in the domains of computer vision and speech recognition. Both theoretical and experimental results have validated the significance of the proposed attack.

翻译：黑盒对抗攻击已展现出颠覆机器学习模型的强大潜力。现有黑盒攻击通过迭代查询目标模型和/或利用本地代理模型的可迁移性来构造对抗样本。近年来，此类攻击可被最先进的防御机制有效缓解，例如通过序列查询模式进行检测，或向模型注入噪声。据我们所知，我们首次研究了具有可证明保证的黑盒攻击新范式——可认证黑盒攻击，该攻击能在查询目标模型前保证对抗样本的攻击成功率。相较于传统经验性黑盒攻击，这种新型黑盒攻击揭示了机器学习模型的重大脆弱性，例如：以可证明置信度突破强大的最先进防御、构建具有高攻击成功率的（无限）对抗样本空间，且所生成对抗样本的攻击成功率无需经过目标模型验证/查询即具备理论保证。具体而言，我们为基于随机对抗样本的黑盒攻击建立了确保攻击成功率的全新理论基础。随后，我们提出了多项创新技术来构造随机对抗样本，同时降低扰动幅度以提升不可感知性。最后，我们在CIFAR10/100、ImageNet和LibriSpeech数据集上全面评估了可认证黑盒攻击，同时以16种最先进经验性黑盒攻击作为基准，在计算机视觉和语音识别领域对抗多种最先进防御机制。理论与实验结果共同验证了所提出攻击方法的重要意义。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日