Transfer-based attack adopts the adversarial examples generated on the surrogate model to attack various models, making it applicable in the physical world and attracting increasing interest. Recently, various adversarial attacks have emerged to boost adversarial transferability from different perspectives. In this work, inspired by the observation that flat local minima are correlated with good generalization, we assume and empirically validate that adversarial examples at a flat local region tend to have good transferability by introducing a penalized gradient norm to the original loss function. Since directly optimizing the gradient regularization norm is computationally expensive and intractable for generating adversarial examples, we propose an approximation optimization method to simplify the gradient update of the objective function. Specifically, we randomly sample an example and adopt a first-order procedure to approximate the curvature of Hessian/vector product, which makes computing more efficient by interpolating two neighboring gradients. Meanwhile, in order to obtain a more stable gradient direction, we randomly sample multiple examples and average the gradients of these examples to reduce the variance due to random sampling during the iterative process. Extensive experimental results on the ImageNet-compatible dataset show that the proposed method can generate adversarial examples at flat local regions, and significantly improve the adversarial transferability on either normally trained models or adversarially trained models than the state-of-the-art attacks. Our codes are available at: https://github.com/Trustworthy-AI-Group/PGN.
翻译:基于迁移的攻击方法利用在替代模型上生成的对抗样本来攻击多种模型,使其在物理世界中具有适用性并引发日益增长的研究兴趣。近年来,从不同角度涌现出多种提升对抗样本可迁移性的攻击方法。受平坦局部极小值与良好泛化能力相关这一观察的启发,本文假设并实证验证:通过向原始损失函数引入惩罚性梯度范数,平坦局部区域内的对抗样本往往具有更优的可迁移性。由于直接优化梯度正则化范数在生成对抗样本时计算成本高昂且难以处理,我们提出一种近似优化方法来简化目标函数的梯度更新。具体而言,我们随机采样一个样本并采用一阶过程近似Hessian/向量积的曲率,通过插值相邻梯度提高计算效率。同时,为获得更稳定的梯度方向,我们随机采样多个样本并平均其梯度,以降低迭代过程中随机采样导致的方差。在ImageNet兼容数据集上的大量实验结果表明,所提方法能够在平坦局部区域生成对抗样本,并显著提升其在正常训练模型和对抗训练模型上的对抗可迁移性,性能优于现有最先进攻击方法。我们的代码可在https://github.com/Trustworthy-AI-Group/PGN获取。