Current Visual-Language Pre-training (VLP) models are vulnerable to adversarial examples. These adversarial examples present substantial security risks to VLP models, as they can leverage inherent weaknesses in the models, resulting in incorrect predictions. In contrast to white-box adversarial attacks, transfer attacks (where the adversary crafts adversarial examples on a white-box model to fool another black-box model) are more reflective of real-world scenarios, thus making them more meaningful for research. By summarizing and analyzing existing research, we identified two factors that can influence the efficacy of transfer attacks on VLP models: inter-modal interaction and data diversity. Based on these insights, we propose a self-augment-based transfer attack method, termed SA-Attack. Specifically, during the generation of adversarial images and adversarial texts, we apply different data augmentation methods to the image modality and text modality, respectively, with the aim of improving the adversarial transferability of the generated adversarial images and texts. Experiments conducted on the FLickr30K and COCO datasets have validated the effectiveness of our method. Our code will be available after this paper is accepted.
翻译:当前视觉-语言预训练(VLP)模型易受对抗样本攻击。这些对抗样本会利用模型的固有弱点导致错误预测,对VLP模型构成重大安全威胁。与白盒对抗攻击不同,迁移攻击(攻击者在白盒模型上构造对抗样本以欺骗另一个黑盒模型)更能反映真实场景,因而更具研究意义。通过归纳分析现有研究,我们识别出两个影响VLP模型迁移攻击效能的关键因素:模态间交互与数据多样性。基于此,我们提出一种基于自增强的迁移攻击方法——SA-Attack。具体而言,在生成对抗图像与对抗文本时,我们对图像模态和文本模态分别施加不同的数据增强方法,旨在提升所生成对抗图像与文本的对抗迁移性。在FLickr30K和COCO数据集上的实验验证了该方法的有效性。本文代码将在论文接收后公开。