Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP

Vision-Language models like CLIP have been shown to be highly effective at linking visual perception and natural language understanding, enabling sophisticated image-text capabilities, including strong retrieval and zero-shot classification performance. Their widespread use, as well as the fact that CLIP models are trained on image-text pairs from the web, make them both a worthwhile and relatively easy target for backdoor attacks. As training foundational models, such as CLIP, from scratch is very expensive, this paper focuses on cleaning potentially poisoned models via fine-tuning. We first show that existing cleaning techniques are not effective against simple structured triggers used in Blended or BadNet backdoor attacks, exposing a critical vulnerability for potential real-world deployment of these models. Then, we introduce PAR, Perturb and Recover, a surprisingly simple yet effective mechanism to remove backdoors from CLIP models. Through extensive experiments across different encoders and types of backdoor attacks, we show that PAR achieves high backdoor removal rate while preserving good standard performance. Finally, we illustrate that our approach is effective even only with synthetic text-image pairs, i.e. without access to real training data. The code and models are available at https://github.com/nmndeep/PerturbAndRecover.

翻译：视觉-语言模型（如CLIP）已被证明在连接视觉感知与自然语言理解方面极为有效，能够实现包括强大的检索和零样本分类性能在内的复杂图文能力。其广泛的应用，以及CLIP模型基于网络图文对进行训练的事实，使其成为既具有价值又相对容易受到后门攻击的目标。由于从头训练CLIP等基础模型的成本极高，本文聚焦于通过微调来清理可能被投毒的模型。我们首先证明，现有清理技术对Blended或BadNet后门攻击中使用的简单结构化触发器无效，这揭示了这些模型在实际部署中存在的严重脆弱性。接着，我们提出PAR（Perturb and Recover，扰动与恢复），这是一种异常简单却有效的机制，用于从CLIP模型中移除后门。通过对不同编码器和各类后门攻击的广泛实验，我们表明PAR能够实现较高的后门移除率，同时保持良好的标准性能。最后，我们证明该方法即使仅使用合成的文本-图像对（即无需访问真实训练数据）也依然有效。代码与模型可在https://github.com/nmndeep/PerturbAndRecover获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日