Introducing Competition to Boost the Transferability of Targeted Adversarial Examples through Clean Feature Mixup

Deep neural networks are widely known to be susceptible to adversarial examples, which can cause incorrect predictions through subtle input modifications. These adversarial examples tend to be transferable between models, but targeted attacks still have lower attack success rates due to significant variations in decision boundaries. To enhance the transferability of targeted adversarial examples, we propose introducing competition into the optimization process. Our idea is to craft adversarial perturbations in the presence of two new types of competitor noises: adversarial perturbations towards different target classes and friendly perturbations towards the correct class. With these competitors, even if an adversarial example deceives a network to extract specific features leading to the target class, this disturbance can be suppressed by other competitors. Therefore, within this competition, adversarial examples should take different attack strategies by leveraging more diverse features to overwhelm their interference, leading to improving their transferability to different models. Considering the computational complexity, we efficiently simulate various interference from these two types of competitors in feature space by randomly mixing up stored clean features in the model inference and named this method Clean Feature Mixup (CFM). Our extensive experimental results on the ImageNet-Compatible and CIFAR-10 datasets show that the proposed method outperforms the existing baselines with a clear margin. Our code is available at https://github.com/dreamflake/CFM.

翻译：深度神经网络普遍被认为易受对抗样本影响，这些样本可通过细微的输入扰动导致模型产生错误预测。此类对抗样本在不同模型间具有迁移性，但由于决策边界的显著差异，定向攻击的成功率仍然较低。为提升定向对抗样本的迁移能力，我们提出在优化过程中引入竞争机制。核心思想是在两类新型竞争性噪声存在的情况下生成对抗扰动：即针对不同目标类别的对抗扰动和针对正确类别的友好扰动。借助这些竞争噪声，即使某个对抗样本欺骗网络提取出导向目标类别的特定特征，该干扰效应也会被其他竞争者抑制。因此，在这种竞争环境下，对抗样本需要采用差异化攻击策略，通过利用更多样化的特征来压制干扰信号，从而提升其在不同模型间的迁移能力。为降低计算复杂度，我们通过在模型推理过程中随机混合存储的干净特征，高效模拟这两类竞争者在特征空间中的多种干扰效应，并将该方法命名为干净特征混合（Clean Feature Mixup, CFM）。在ImageNet-Compatible和CIFAR-10数据集上的大量实验结果表明，所提方法以显著优势优于现有基线方法。相关代码已开源至 https://github.com/dreamflake/CFM。