Machine Learning (ML) systems are vulnerable to adversarial examples, particularly those from query-based black-box attacks. Despite various efforts to detect and prevent such attacks, there is a need for a more comprehensive approach to logging, analyzing, and sharing evidence of attacks. While classic security benefits from well-established forensics and intelligence sharing, Machine Learning is yet to find a way to profile its attackers and share information about them. In response, this paper introduces SEA, a novel ML security system to characterize black-box attacks on ML systems for forensic purposes and to facilitate human-explainable intelligence sharing. SEA leverages the Hidden Markov Models framework to attribute the observed query sequence to known attacks. It thus understands the attack's progression rather than just focusing on the final adversarial examples. Our evaluations reveal that SEA is effective at attack attribution, even on their second occurrence, and is robust to adaptive strategies designed to evade forensics analysis. Interestingly, SEA's explanations of the attack behavior allow us even to fingerprint specific minor implementation bugs in attack libraries. For example, we discover that the SignOPT and Square attacks implementation in ART v1.14 sends over 50% specific zero difference queries. We thoroughly evaluate SEA on a variety of settings and demonstrate that it can recognize the same attack's second occurrence with 90+% Top-1 and 95+% Top-3 accuracy.
翻译:机器学习系统易受对抗性样本攻击,尤其是来自查询型黑盒攻击的威胁。尽管已有多种检测和防御此类攻击的方法,但在攻击证据的记录、分析与共享方面仍需更全面的方案。传统安全领域受益于成熟的取证与情报共享机制,而机器学习领域尚未形成对攻击者画像及信息共享的有效方法。为此,本文提出SEA——一种新型机器学习安全系统,旨在对机器学习系统遭受的黑盒攻击进行特征刻画以实现取证目的,并促进可解释的情报共享。SEA利用隐马尔可夫模型框架,将观测到的查询序列归因至已知攻击类型,从而理解攻击的演进过程而不仅关注最终对抗性样本。实验表明,SEA即使面对第二次出现的攻击也能有效归因,且对旨在规避取证分析的适应性策略具有鲁棒性。值得注意的是,SEA对攻击行为的解释能力甚至可定位攻击库中特定微小实现缺陷。例如,我们发现在ART v1.14中SignOPT与Square攻击的实现会发送超过50%的特定零差异查询。我们在多种场景下全面评估了SEA,证明其对同一攻击第二次出现的识别准确率可达90%以上的Top-1和95%以上的Top-3。