Intermediate-level attacks that attempt to perturb feature representations following an adversarial direction drastically have shown favorable performance in crafting transferable adversarial examples. Existing methods in this category are normally formulated with two separate stages, where a directional guide is required to be determined at first and the scalar projection of the intermediate-level perturbation onto the directional guide is enlarged thereafter. The obtained perturbation deviates from the guide inevitably in the feature space, and it is revealed in this paper that such a deviation may lead to sub-optimal attack. To address this issue, we develop a novel intermediate-level method that crafts adversarial examples within a single stage of optimization. In particular, the proposed method, named intermediate-level perturbation decay (ILPD), encourages the intermediate-level perturbation to be in an effective adversarial direction and to possess a great magnitude simultaneously. In-depth discussion verifies the effectiveness of our method. Experimental results show that it outperforms state-of-the-arts by large margins in attacking various victim models on ImageNet (+10.07% on average) and CIFAR-10 (+3.88% on average). Our code is at https://github.com/qizhangli/ILPD-attack.
翻译:中间级攻击方法通过沿对抗方向剧烈扰动特征表示,在生成可迁移对抗样本方面展现出优异性能。现有此类方法通常采用两阶段优化框架:首先确定方向引导,随后扩大中间级扰动在该引导方向上的标量投影。本文发现,所得扰动在特征空间中不可避免地偏离该引导方向,这种偏离可能导致攻击效果次优。为解决该问题,我们提出一种新型单阶段优化中间级攻击方法——中间级扰动衰减(ILPD)。该方法在单阶段优化中同步推动中间级扰动沿有效对抗方向加速并保持较大幅值。深入讨论验证了该方法的有效性。实验结果表明,在ImageNet(平均提升10.07%)和CIFAR-10(平均提升3.88%)数据集上攻击多种受害者模型时,本方法以显著优势超越现有最优技术。代码开源地址:https://github.com/qizhangli/ILPD-attack。