In the transfer-based adversarial attacks, adversarial examples are only generated by the surrogate models and achieve effective perturbation in the victim models. Although considerable efforts have been developed on improving the transferability of adversarial examples generated by transfer-based adversarial attacks, our investigation found that, the big deviation between the actual and steepest update directions of the current transfer-based adversarial attacks is caused by the large update step length, resulting in the generated adversarial examples can not converge well. However, directly reducing the update step length will lead to serious update oscillation so that the generated adversarial examples also can not achieve great transferability to the victim models. To address these issues, a novel transfer-based attack, namely direction tuning attack, is proposed to not only decrease the update deviation in the large step length, but also mitigate the update oscillation in the small sampling step length, thereby making the generated adversarial examples converge well to achieve great transferability on victim models. In addition, a network pruning method is proposed to smooth the decision boundary, thereby further decreasing the update oscillation and enhancing the transferability of the generated adversarial examples. The experiment results on ImageNet demonstrate that the average attack success rate (ASR) of the adversarial examples generated by our method can be improved from 87.9\% to 94.5\% on five victim models without defenses, and from 69.1\% to 76.2\% on eight advanced defense methods, in comparison with that of latest gradient-based attacks.
翻译:在基于迁移的对抗攻击中,对抗样本仅由替代模型生成,并能在受害者模型上实现有效扰动。尽管已有大量工作致力于提升基于迁移的对抗攻击所生成样本的可迁移性,本研究发现,当前基于迁移的对抗攻击中实际更新方向与最陡更新方向之间存在较大偏差,这是由较大的更新步长导致的,使得生成的对抗样本无法良好收敛。然而,直接减小更新步长会引发严重的更新震荡,导致生成的对抗样本同样无法对受害者模型实现优异的可迁移性。为解决上述问题,本文提出一种新颖的基于迁移的攻击方法——方向微调攻击,该方法既能在大步长下降低更新偏差,又能缓解小采样步长下的更新震荡,从而使生成的对抗样本良好收敛,在受害者模型上实现高度可迁移性。此外,本文还提出一种网络剪枝方法来平滑决策边界,进一步减少更新震荡并增强生成对抗样本的可迁移性。在ImageNet上的实验结果表明,与最新基于梯度的攻击方法相比,本方法生成的对抗样本在无防御的五种受害者模型上的平均攻击成功率(ASR)从87.9%提升至94.5%,在八种先进防御方法上的平均攻击成功率从69.1%提升至76.2%。