AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning

from arxiv, This paper has been accepted by the ACM International Conference on Multimedia (ACM MM '23, October 29-November 3, 2023, Ottawa, ON, Canada)

Multimodal contrastive learning aims to train a general-purpose feature extractor, such as CLIP, on vast amounts of raw, unlabeled paired image-text data. This can greatly benefit various complex downstream tasks, including cross-modal image-text retrieval and image classification. Despite its promising prospect, the security issue of cross-modal pre-trained encoder has not been fully explored yet, especially when the pre-trained encoder is publicly available for commercial use. In this work, we propose AdvCLIP, the first attack framework for generating downstream-agnostic adversarial examples based on cross-modal pre-trained encoders. AdvCLIP aims to construct a universal adversarial patch for a set of natural images that can fool all the downstream tasks inheriting the victim cross-modal pre-trained encoder. To address the challenges of heterogeneity between different modalities and unknown downstream tasks, we first build a topological graph structure to capture the relevant positions between target samples and their neighbors. Then, we design a topology-deviation based generative adversarial network to generate a universal adversarial patch. By adding the patch to images, we minimize their embeddings similarity to different modality and perturb the sample distribution in the feature space, achieving unviersal non-targeted attacks. Our results demonstrate the excellent attack performance of AdvCLIP on two types of downstream tasks across eight datasets. We also tailor three popular defenses to mitigate AdvCLIP, highlighting the need for new defense mechanisms to defend cross-modal pre-trained encoders.

翻译：多模态对比学习旨在通过海量原始无标注的图像-文本配对数据，训练通用特征提取器（如CLIP）。该方法可显著提升跨模态图像-文本检索、图像分类等复杂下游任务的性能。然而，尽管前景广阔，跨模态预训练编码器的安全问题尚未得到充分探索，尤其当该编码器被公开用于商业场景时。本文提出AdvCLIP——首个基于跨模态预训练编码器生成下游无关对抗样本的攻击框架。AdvCLIP旨在为自然图像集构建通用对抗补丁，进而欺骗所有继承受害跨模态预训练编码器的下游任务。为解决不同模态间的异质性及未知下游任务带来的挑战，我们首先构建拓扑图结构以捕获目标样本与其邻域样本之间的相对位置关系，进而设计基于拓扑偏移的生成对抗网络生成通用对抗补丁。通过将补丁添加至图像，最小化其嵌入向量与不同模态的相似度，并扰动特征空间的样本分布，实现通用非定向攻击。实验结果表明，AdvCLIP在八个数据集上的两类下游任务中均展现出卓越的攻击性能。我们同时针对AdvCLIP定制了三种主流防御方法，突显了设计新型防御机制以保护跨模态预训练编码器的必要性。