Adversarial transferability in black-box scenarios presents a unique challenge: while attackers can employ surrogate models to craft adversarial examples, they lack assurance on whether these examples will successfully compromise the target model. Until now, the prevalent method to ascertain success has been trial and error-testing crafted samples directly on the victim model. This approach, however, risks detection with every attempt, forcing attackers to either perfect their first try or face exposure. Our paper introduces a ranking strategy that refines the transfer attack process, enabling the attacker to estimate the likelihood of success without repeated trials on the victim's system. By leveraging a set of diverse surrogate models, our method can predict transferability of adversarial examples. This strategy can be used to either select the best sample to use in an attack or the best perturbation to apply to a specific sample. Using our strategy, we were able to raise the transferability of adversarial examples from a mere 20% - akin to random selection-up to near upper-bound levels, with some scenarios even witnessing a 100% success rate. This substantial improvement not only sheds light on the shared susceptibilities across diverse architectures but also demonstrates that attackers can forego the detectable trial-and-error tactics raising increasing the threat of surrogate-based attacks.
翻译:在黑盒场景中,对抗样本的可迁移性构成一项独特挑战:虽然攻击者可利用替代模型生成对抗样本,但无法确保这些样本能否成功攻破目标模型。迄今为止,判断攻击成败的主流方法仍是试错法——直接在受害者模型上测试生成的样本。然而,这种每轮尝试都将面临被检测的风险,迫使攻击者要么在首次尝试中完美得手,要么暴露行踪。本文提出一种排序策略来优化转移攻击流程,使攻击者无需在受害者系统上进行重复试验即可估算攻击成功的可能性。通过利用一组多样化的替代模型,我们的方法能够预测对抗样本的可迁移性。该策略既可选取攻击过程中最具效力的样本,也可针对特定样本选择最佳扰动方案。采用此策略后,我们成功将对抗样本的可迁移性从接近随机选择的20% 提升至接近理论上限的水平,部分场景甚至实现了100% 的成功率。这一显著提升不仅揭示了不同架构间存在的共性脆弱性,更证明攻击者能够摒弃易被识别的试错策略,从而显著增强基于替代模型攻击的威胁性。