Adversarial examples, inputs designed to induce worst-case behavior in machine learning models, have been extensively studied over the past decade. Yet, our understanding of this phenomenon stems from a rather fragmented pool of knowledge; at present, there are a handful of attacks, each with disparate assumptions in threat models and incomparable definitions of optimality. In this paper, we propose a systematic approach to characterize worst-case (i.e., optimal) adversaries. We first introduce an extensible decomposition of attacks in adversarial machine learning by atomizing attack components into surfaces and travelers. With our decomposition, we enumerate over components to create 576 attacks (568 of which were previously unexplored). Next, we propose the Pareto Ensemble Attack (PEA): a theoretical attack that upper-bounds attack performance. With our new attacks, we measure performance relative to the PEA on: both robust and non-robust models, seven datasets, and three extended lp-based threat models incorporating compute costs, formalizing the Space of Adversarial Strategies. From our evaluation we find that attack performance to be highly contextual: the domain, model robustness, and threat model can have a profound influence on attack efficacy. Our investigation suggests that future studies measuring the security of machine learning should: (1) be contextualized to the domain & threat models, and (2) go beyond the handful of known attacks used today.
翻译:对抗样本——旨在诱导机器学习模型产生最坏行为的输入——在过去十年中得到了广泛研究。然而,我们对此现象的理解源于相当碎片化的知识积累:目前存在少量攻击方法,各自在威胁模型上具有不同的假设,且最优性定义互不兼容。本文提出了一种系统化方法来刻画最坏情况(即最优)的对抗者。我们首先通过将攻击组件原子化为表面和旅行者,提出对抗机器学习中攻击的可扩展分解方案。基于此分解,我们枚举组件组合生成了576种攻击(其中568种此前未被探索)。接着,我们提出帕累托集成攻击(PEA):一种理论上界攻击性能的对抗方法。利用这些新型攻击,我们测量了相对于PEA的性能表现,涵盖:鲁棒与非鲁棒模型、七个数据集、以及三种引入计算成本的扩展lp范数威胁模型,由此形式化定义了对抗策略空间。通过实验评估发现,攻击性能具有高度情境依赖性:领域特性、模型鲁棒性和威胁模型对攻击效能产生深远影响。我们的研究表明,未来衡量机器学习安全性的研究应当:(1)基于特定领域与威胁模型进行情境化分析,(2)超越当前使用的少数已知攻击方法。