Evasion attacks are a threat to machine learning models, where adversaries attempt to affect classifiers by injecting malicious samples. An alarming side-effect of evasion attacks is their ability to transfer among different models: this property is called transferability. Therefore, an attacker can produce adversarial samples on a custom model (surrogate) to conduct the attack on a victim's organization later. Although literature widely discusses how adversaries can transfer their attacks, their experimental settings are limited and far from reality. For instance, many experiments consider both attacker and defender sharing the same dataset, balance level (i.e., how the ground truth is distributed), and model architecture. In this work, we propose the DUMB attacker model. This framework allows analyzing if evasion attacks fail to transfer when the training conditions of surrogate and victim models differ. DUMB considers the following conditions: Dataset soUrces, Model architecture, and the Balance of the ground truth. We then propose a novel testbed to evaluate many state-of-the-art evasion attacks with DUMB; the testbed consists of three computer vision tasks with two distinct datasets each, four types of balance levels, and three model architectures. Our analysis, which generated 13K tests over 14 distinct attacks, led to numerous novel findings in the scope of transferable attacks with surrogate models. In particular, mismatches between attackers and victims in terms of dataset source, balance levels, and model architecture lead to non-negligible loss of attack performance.
翻译:逃避攻击对机器学习模型构成威胁,其中攻击者试图通过注入恶意样本来影响分类器。逃避攻击的一个令人担忧的副作用是其在不同模型之间迁移的能力:这一特性被称为可迁移性。因此,攻击者可以在自定义模型(替代模型)上生成对抗样本,随后对受害者组织发起攻击。尽管文献广泛讨论了攻击者如何实现攻击迁移,但其实验设置往往存在局限性,且远离现实场景。例如,许多实验假设攻击者与防御者共享相同数据集、平衡水平(即真实标签的分布情况)以及模型架构。在本工作中,我们提出了DUMB攻击者模型。该框架能够分析当替代模型与受害者模型的训练条件存在差异时,逃避攻击是否无法成功迁移。DUMB考虑以下条件:数据来源、模型架构以及真实标签的平衡水平。随后,我们提出了一种新型测试平台,用于结合DUMB评估多种最先进的逃避攻击;该测试平台包含三个计算机视觉任务,每个任务配备两个不同的数据集、四种平衡水平以及三种模型架构。我们的分析在14种不同攻击上生成了13,000次测试,从而在替代模型的迁移攻击范围内得出了众多新颖发现。特别地,攻击者与受害者在数据来源、平衡水平以及模型架构上的不匹配会导致攻击性能出现不可忽视的损失。