Transfer-based attacks generate adversarial examples on the surrogate model, which can mislead other black-box models without any access, making it promising to attack real-world applications. Recently, several works have been proposed to boost adversarial transferability, in which the surrogate model is usually overlooked. In this work, we identify that non-linear layers (e.g., ReLU, max-pooling, etc.) truncate the gradient during backward propagation, making the gradient w.r.t.input image imprecise to the loss function. We hypothesize and empirically validate that such truncation undermines the transferability of adversarial examples. Based on these findings, we propose a novel method called Backward Propagation Attack (BPA) to increase the relevance between the gradient w.r.t. input image and loss function so as to generate adversarial examples with higher transferability. Specifically, BPA adopts a non-monotonic function as the derivative of ReLU and incorporates softmax with temperature to smooth the derivative of max-pooling, thereby mitigating the information loss during the backward propagation of gradients. Empirical results on the ImageNet dataset demonstrate that not only does our method substantially boost the adversarial transferability, but it also is general to existing transfer-based attacks.
翻译:基于迁移的攻击通过在代理模型上生成对抗样本,无需访问即可误导其他黑盒模型,这使得其在攻击实际应用场景中具有潜力。近年来,已有若干研究致力于提升对抗迁移性,但这些工作通常忽视了代理模型本身。本研究发现,非线性层(如ReLU、最大池化等)在反向传播过程中截断了梯度,导致输入图像相对于损失函数的梯度不够精确。我们假设并通过实验验证,这种截断会削弱对抗样本的迁移性。基于这些发现,我们提出一种名为反向传播攻击(BPA)的新方法,通过增强输入图像梯度与损失函数之间的相关性,从而生成具有更高迁移性的对抗样本。具体而言,BPA采用非单调函数作为ReLU的导数,并引入带温度的softmax来平滑最大池化的导数,从而减轻反向传播过程中梯度的信息损失。在ImageNet数据集上的实验结果表明,我们的方法不仅显著提升了对抗迁移性,而且对现有基于迁移的攻击具有通用性。