SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks

Machine Learning (ML) systems are vulnerable to adversarial examples, particularly those from query-based black-box attacks. Despite various efforts to detect and prevent such attacks, there is a need for a more comprehensive approach to logging, analyzing, and sharing evidence of attacks. While classic security benefits from well-established forensics and intelligence sharing, Machine Learning is yet to find a way to profile its attackers and share information about them. In response, this paper introduces SEA, a novel ML security system to characterize black-box attacks on ML systems for forensic purposes and to facilitate human-explainable intelligence sharing. SEA leverages the Hidden Markov Models framework to attribute the observed query sequence to known attacks. It thus understands the attack's progression rather than just focusing on the final adversarial examples. Our evaluations reveal that SEA is effective at attack attribution, even on their second occurrence, and is robust to adaptive strategies designed to evade forensics analysis. Interestingly, SEA's explanations of the attack behavior allow us even to fingerprint specific minor implementation bugs in attack libraries. For example, we discover that the SignOPT and Square attacks implementation in ART v1.14 sends over 50% specific zero difference queries. We thoroughly evaluate SEA on a variety of settings and demonstrate that it can recognize the same attack's second occurrence with 90+% Top-1 and 95+% Top-3 accuracy.

翻译：机器学习系统易受对抗性样本攻击，尤其是来自查询型黑盒攻击的威胁。尽管已有多种检测和防御此类攻击的方法，但在攻击证据的记录、分析与共享方面仍需更全面的方案。传统安全领域受益于成熟的取证与情报共享机制，而机器学习领域尚未形成对攻击者画像及信息共享的有效方法。为此，本文提出SEA——一种新型机器学习安全系统，旨在对机器学习系统遭受的黑盒攻击进行特征刻画以实现取证目的，并促进可解释的情报共享。SEA利用隐马尔可夫模型框架，将观测到的查询序列归因至已知攻击类型，从而理解攻击的演进过程而不仅关注最终对抗性样本。实验表明，SEA即使面对第二次出现的攻击也能有效归因，且对旨在规避取证分析的适应性策略具有鲁棒性。值得注意的是，SEA对攻击行为的解释能力甚至可定位攻击库中特定微小实现缺陷。例如，我们发现在ART v1.14中SignOPT与Square攻击的实现会发送超过50%的特定零差异查询。我们在多种场景下全面评估了SEA，证明其对同一攻击第二次出现的识别准确率可达90%以上的Top-1和95%以上的Top-3。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日