Contrastive learning (CL) pre-trains general-purpose encoders using an unlabeled pre-training dataset, which consists of images or image-text pairs. CL is vulnerable to data poisoning based backdoor attacks (DPBAs), in which an attacker injects poisoned inputs into the pre-training dataset so the encoder is backdoored. However, existing DPBAs achieve limited effectiveness. In this work, we take the first step to analyze the limitations of existing attacks and propose new DPBAs called CorruptEncoder to CL. CorruptEncoder uses a theory-guided method to create optimal poisoned inputs to maximize attack effectiveness. Our experiments show that CorruptEncoder substantially outperforms existing DPBAs. In particular, CorruptEncoder is the first DPBA that achieves more than 90% attack success rates with only a few (3) reference images and a small poisoning ratio (0.5%). Moreover, we also propose a defense, called localized cropping, to defend against DPBAs. Our results show that our defense can reduce the effectiveness of DPBAs, but it sacrifices the utility of the encoder, highlighting the need for new defenses.
翻译:对比学习(CL)通过未标记的预训练数据集(由图像或图像-文本对组成)预训练通用编码器。CL易受基于数据投毒的后门攻击(DPBA),攻击者向预训练数据集中注入投毒输入,导致编码器被植入后门。然而,现有DPBA的攻击效果有限。本研究首次分析现有攻击的局限性,并提出名为CorruptEncoder的新型DPBA以攻击CL。CorruptEncoder采用理论指导方法创建最优投毒输入,以最大化攻击效果。实验表明,CorruptEncoder显著优于现有DPBA。特别地,CorruptEncoder是首个仅需极少数(3张)参考图像和较小投毒比例(0.5%)即可实现超过90%攻击成功率的DPBA。此外,我们提出一种名为局部裁剪的防御策略来对抗DPBA。研究结果显示,该防御可降低DPBA的有效性,但会牺牲编码器的效用,凸显了开发新防御手段的必要性。