Intermediate-level attacks that attempt to perturb feature representations following an adversarial direction drastically have shown favorable performance in crafting transferable adversarial examples. Existing methods in this category are normally formulated with two separate stages, where a directional guide is required to be determined at first and the scalar projection of the intermediate-level perturbation onto the directional guide is enlarged thereafter. The obtained perturbation deviates from the guide inevitably in the feature space, and it is revealed in this paper that such a deviation may lead to sub-optimal attack. To address this issue, we develop a novel intermediate-level method that crafts adversarial examples within a single stage of optimization. In particular, the proposed method, named intermediate-level perturbation decay (ILPD), encourages the intermediate-level perturbation to be in an effective adversarial direction and to possess a great magnitude simultaneously. In-depth discussion verifies the effectiveness of our method. Experimental results show that it outperforms state-of-the-arts by large margins in attacking various victim models on ImageNet (+10.07% on average) and CIFAR-10 (+3.88% on average). Our code is at https://github.com/qizhangli/ILPD-attack.
翻译:中间层级攻击通过沿对抗方向剧烈扰动特征表示,已展现出生成可迁移对抗样本的优越性能。现有方法通常采用两阶段优化范式:先确定方向引导向量,再放大中间层扰动在该方向上的标量投影。然而,特征空间中实际扰动不可避免地偏离预设引导方向,本文揭示这种偏差可能导致次优攻击效果。为解决该问题,我们提出一种新型单阶段优化中间层方法——中间层级扰动衰减(ILPD),其核心思想是同时确保中间层扰动兼具有效对抗方向与较大幅值。理论分析验证了该方法的有效性。实验结果表明,在ImageNet(平均提升10.07%)和CIFAR-10(平均提升3.88%)数据集上攻击多种受害者模型时,ILPD以显著优势超越现有最优方法。代码开源地址:https://github.com/qizhangli/ILPD-attack