The transferability of adversarial examples is a key issue in the security of deep neural networks. The possibility of an adversarial example crafted for a source model fooling another targeted model makes the threat of adversarial attacks more realistic. Measuring transferability is a crucial problem, but the Attack Success Rate alone does not provide a sound evaluation. This paper proposes a new methodology for evaluating transferability by putting distortion in a central position. This new tool shows that transferable attacks may perform far worse than a black box attack if the attacker randomly picks the source model. To address this issue, we propose a new selection mechanism, called FiT, which aims at choosing the best source model with only a few preliminary queries to the target. Our experimental results show that FiT is highly effective at selecting the best source model for multiple scenarios such as single-model attacks, ensemble-model attacks and multiple attacks (Code available at: https://github.com/t-maho/transferability_measure_fit).
翻译:对抗样本的可迁移性是深度神经网络安全中的关键问题。针对源模型生成的对抗样本可能欺骗另一个目标模型,这使得对抗攻击的威胁更加现实。衡量可迁移性是一个关键问题,但仅凭攻击成功率无法提供可靠的评估。本文提出了一种新的可迁移性评估方法,将失真置于核心位置。这一新工具表明,如果攻击者随机选择源模型,可迁移攻击的效果可能远逊于黑盒攻击。为应对此问题,我们提出了一种名为FiT的新型选择机制,旨在通过仅对目标进行少量初步查询来选择最佳源模型。实验结果表明,FiT在单模型攻击、集成模型攻击与多重攻击等多种场景下能高效选择最佳源模型(代码见:https://github.com/t-maho/transferability_measure_fit)。