Recent vision-language foundation models, such as CLIP, have demonstrated superior capabilities in learning representations that can be transferable across diverse range of downstream tasks and domains. With the emergence of such powerful models, it has become crucial to effectively leverage their capabilities in tackling challenging vision tasks. On the other hand, only a few works have focused on devising adversarial examples that transfer well to both unknown domains and model architectures. In this paper, we propose a novel transfer attack method called PDCL-Attack, which leverages the CLIP model to enhance the transferability of adversarial perturbations generated by a generative model-based attack framework. Specifically, we formulate an effective prompt-driven feature guidance by harnessing the semantic representation power of text, particularly from the ground-truth class labels of input images. To the best of our knowledge, we are the first to introduce prompt learning to enhance the transferable generative attacks. Extensive experiments conducted across various cross-domain and cross-model settings empirically validate our approach, demonstrating its superiority over state-of-the-art methods.
翻译:近年来,视觉-语言基础模型(如CLIP)已展现出卓越的表征学习能力,其学习到的特征能够迁移至多种下游任务与领域。随着此类强大模型的出现,如何有效利用其能力以应对具有挑战性的视觉任务变得至关重要。另一方面,目前仅有少数研究致力于设计能够同时良好迁移至未知领域和模型架构的对抗样本。本文提出一种名为PDCL-Attack的新型迁移攻击方法,该方法利用CLIP模型增强基于生成模型的攻击框架所产生的对抗扰动的可迁移性。具体而言,我们通过利用文本(特别是输入图像的真实类别标签)的语义表征能力,构建了一种有效的提示驱动特征引导机制。据我们所知,本文首次引入提示学习以增强可迁移的生成式攻击。在多种跨域与跨模型设置下进行的大量实验从经验上验证了本方法的有效性,结果表明其性能优于现有最先进方法。