Fooling Contrastive Language-Image Pre-trained Models with CLIPMasterPrints

Models leveraging both visual and textual data such as Contrastive Language-Image Pre-training (CLIP), are the backbone of many recent advances in artificial intelligence. In this work, we show that despite their versatility, such models are vulnerable to what we refer to as fooling master images. Fooling master images are capable of maximizing the confidence score of a CLIP model for a significant number of widely varying prompts, while being either unrecognizable or unrelated to the attacked prompts for humans. The existence of such images is problematic as it could be used by bad actors to maliciously interfere with CLIP-trained image retrieval models in production with comparably small effort as a single image can attack many different prompts. We demonstrate how fooling master images for CLIP (CLIPMasterPrints) can be mined using stochastic gradient descent, projected gradient descent, or blackbox optimization. Contrary to many common adversarial attacks, the blackbox optimization approach allows us to mine CLIPMasterPrints even when the weights of the model are not accessible. We investigate the properties of the mined images, and find that images trained on a small number of image captions generalize to a much larger number of semantically related captions. We evaluate possible mitigation strategies, where we increase the robustness of the model and introduce an approach to automatically detect CLIPMasterPrints to sanitize the input of vulnerable models. Finally, we find that vulnerability to CLIPMasterPrints is related to a modality gap in contrastive pre-trained multi-modal networks. Code available at https://github.com/matfrei/CLIPMasterPrints.

翻译：利用视觉与文本数据结合的模型（如对比语言-图像预训练模型，CLIP）是近年来人工智能领域多项突破性进展的基石。本研究表明，尽管此类模型具有多功能性，但它们易受一种我们称为"欺骗主图像"的攻击。此类图像能显著提升CLIP模型对大量多样化提示的置信度，同时对人眼而言要么完全无法识别，要么与受攻击提示毫无关联。这类图像的存在具有危害性，因为恶意攻击者能以相对较小的代价干扰生产环境中的CLIP训练图像检索模型——单张图像即可同时攻击多种提示。我们展示了如何通过随机梯度下降、投影梯度下降或黑盒优化方法挖掘针对CLIP的欺骗主图像（CLIPMasterPrints）。与常见对抗攻击不同，黑盒优化方法即便在无法获取模型权重时仍能挖掘CLIPMasterPrints。通过分析挖掘所得图像的特性，我们发现基于少量图像描述训练的图像能够泛化到大量语义相关的描述。在防御策略方面，我们提升了模型的鲁棒性，并提出自动检测CLIPMasterPrints的方法以净化易受攻击模型的输入。最后，我们发现对比预训练多模态网络中的模态间隙与模型对CLIPMasterPrints的脆弱性密切相关。代码参见https://github.com/matfrei/CLIPMasterPrints。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日