In the transfer-based adversarial attacks, adversarial examples are only generated by the surrogate models and achieve effective perturbation in the victim models. Although considerable efforts have been developed on improving the transferability of adversarial examples generated by transfer-based adversarial attacks, our investigation found that, the big deviation between the actual and steepest update directions of the current transfer-based adversarial attacks is caused by the large update step length, resulting in the generated adversarial examples can not converge well. However, directly reducing the update step length will lead to serious update oscillation so that the generated adversarial examples also can not achieve great transferability to the victim models. To address these issues, a novel transfer-based attack, namely direction tuning attack, is proposed to not only decrease the update deviation in the large step length, but also mitigate the update oscillation in the small sampling step length, thereby making the generated adversarial examples converge well to achieve great transferability on victim models. In addition, a network pruning method is proposed to smooth the decision boundary, thereby further decreasing the update oscillation and enhancing the transferability of the generated adversarial examples. The experiment results on ImageNet demonstrate that the average attack success rate (ASR) of the adversarial examples generated by our method can be improved from 87.9\% to 94.5\% on five victim models without defenses, and from 69.1\% to 76.2\% on eight advanced defense methods, in comparison with that of latest gradient-based attacks.
翻译:在基于迁移的对抗攻击中,对抗样本仅通过代理模型生成,并在受害者模型上实现有效扰动。尽管已有大量工作致力于提升基于迁移的对抗攻击生成样本的迁移性,本研究发现,当前基于迁移的对抗攻击实际更新方向与最陡更新方向之间存在较大偏差,其根源在于较大的更新步长,导致生成的对抗样本无法良好收敛。然而,直接减小更新步长会引发严重的更新振荡,使得生成的对抗样本同样无法在受害者模型上获得良好的迁移性。为解决上述问题,本文提出一种新型基于迁移的攻击方法——方向调优攻击,该方法既能在大步长下降低更新偏差,又能缓解小采样步长下的更新振荡,从而使生成的对抗样本能够良好收敛,在受害者模型上实现优异的迁移性。此外,本文还提出一种网络剪枝方法以平滑决策边界,从而进一步降低更新振荡并增强生成对抗样本的迁移性。在ImageNet上的实验结果表明,与最新的基于梯度的攻击方法相比,本方法生成的对抗样本在五个无防御受害者模型上的平均攻击成功率(ASR)从87.9%提升至94.5%,在八种先进防御方法上的平均攻击成功率从69.1%提升至76.2%。