Contrastive Language-Image Pre-training (CLIP) on large image-caption datasets has achieved remarkable success in zero-shot classification and enabled transferability to new domains. However, CLIP is extremely more vulnerable to targeted data poisoning and backdoor attacks, compared to supervised learning. Perhaps surprisingly, poisoning 0.0001% of CLIP pre-training data is enough to make targeted data poisoning attacks successful. This is four orders of magnitude smaller than what is required to poison supervised models. Despite this vulnerability, existing methods are very limited in defending CLIP models during pre-training. In this work, we propose a strong defense, SAFECLIP, to safely pre-train CLIP against targeted data poisoning and backdoor attacks. SAFECLIP warms up the model by applying unimodal contrastive learning (CL) on image and text modalities separately. Then, it carefully divides the data into safe and risky subsets. SAFECLIP trains on the risky data by applying unimodal CL to image and text modalities separately, and trains on the safe data using the CLIP loss. By gradually increasing the size of the safe subset during the training, SAFECLIP effectively breaks targeted data poisoning and backdoor attacks without harming the CLIP performance. Our extensive experiments show that SAFECLIP decrease the attack success rate of targeted data poisoning attacks from 93.75% to 0% and that of the backdoor attacks from 100% to 0%, without harming the CLIP performance on various datasets.
翻译:对比语言-图像预训练(CLIP)在大规模图像-文本数据集上已取得零样本分类的显著成功,并展现出向新领域迁移的能力。然而,相较于监督学习,CLIP对目标数据投毒和后门攻击的脆弱性极为突出。令人惊讶的是,仅需污染CLIP预训练数据的0.0001%即可成功实施目标数据投毒攻击——这一比例比污染监督模型所需的数据量低了四个数量级。尽管存在这一弱点,现有方法在预训练阶段防御CLIP模型的能力十分有限。本文提出一种强效防御方法SAFECLIP,用于安全预训练CLIP以抵御目标数据投毒和后门攻击。SAFECLIP首先通过分别对图像和文本模态应用单模态对比学习(CL)来预热模型,随后将数据谨慎划分为安全子集与风险子集。针对风险数据,该方法分别对图像和文本模态应用单模态CL;针对安全数据,则使用CLIP损失进行训练。通过训练过程中逐步扩大安全子集的规模,SAFECLIP在不损害CLIP性能的前提下有效瓦解目标数据投毒与后门攻击。大量实验表明,SAFECLIP可将目标数据投毒攻击的成功率从93.75%降至0%,后门攻击的成功率从100%降至0%,同时在各数据集上保持CLIP性能不变。