Deep neural networks are vulnerable to adversarial examples that exhibit transferability across various models. Numerous approaches are proposed to enhance the transferability of adversarial examples, including advanced optimization, data augmentation, and model modifications. However, these methods still show limited transferability, particularly in cross-architecture scenarios, such as from CNN to ViT. To achieve high transferability, we propose a technique termed Spatial Adversarial Alignment (SAA), which employs an alignment loss and leverages a witness model to fine-tune the surrogate model. Specifically, SAA consists of two key parts: spatial-aware alignment and adversarial-aware alignment. First, we minimize the divergences of features between the two models in both global and local regions, facilitating spatial alignment. Second, we introduce a self-adversarial strategy that leverages adversarial examples to impose further constraints, aligning features from an adversarial perspective. Through this alignment, the surrogate model is trained to concentrate on the common features extracted by the witness model. This facilitates adversarial attacks on these shared features, thereby yielding perturbations that exhibit enhanced transferability. Extensive experiments on various architectures on ImageNet show that aligned surrogate models based on SAA can provide higher transferable adversarial examples, especially in cross-architecture attacks.
翻译:深度神经网络易受对抗样本攻击,这些样本在不同模型间表现出可迁移性。为增强对抗样本的可迁移性,研究者提出了多种方法,包括高级优化、数据增强和模型修改。然而,这些方法在可迁移性方面仍存在局限,尤其是在跨架构场景(如从CNN到ViT)中。为实现高可迁移性,我们提出一种称为空间对抗对齐的技术,该技术采用对齐损失并利用见证模型对代理模型进行微调。具体而言,SAA包含两个关键部分:空间感知对齐和对抗感知对齐。首先,我们在全局和局部区域最小化两个模型特征间的差异,促进空间对齐。其次,我们引入一种自对抗策略,利用对抗样本施加进一步约束,从对抗视角对齐特征。通过对齐,代理模型被训练以专注于见证模型提取的共性特征。这有助于针对这些共享特征进行对抗攻击,从而产生具有增强可迁移性的扰动。在ImageNet数据集上对不同架构的大量实验表明,基于SAA的对齐代理模型能够提供更高可迁移性的对抗样本,尤其在跨架构攻击中表现突出。