Contrastive learning (CL) pre-trains general-purpose encoders using an unlabeled pre-training dataset, which consists of images or image-text pairs. CL is vulnerable to data poisoning based backdoor attacks (DPBAs), in which an attacker injects poisoned inputs into the pre-training dataset so the encoder is backdoored. However, existing DPBAs achieve limited effectiveness. In this work, we take the first step to analyze the limitations of existing backdoor attacks and propose new DPBAs called CorruptEncoder to CL. CorruptEncoder introduces a new attack strategy to create poisoned inputs and uses a theory-guided method to maximize attack effectiveness. Our experiments show that CorruptEncoder substantially outperforms existing DPBAs. In particular, CorruptEncoder is the first DPBA that achieves more than 90% attack success rates with only a few (3) reference images and a small poisoning ratio 0.5%. Moreover, we also propose a defense, called localized cropping, to defend against DPBAs. Our results show that our defense can reduce the effectiveness of DPBAs, but it sacrifices the utility of the encoder, highlighting the need for new defenses.
翻译:对比学习(CL)使用无标签的预训练数据集(包含图像或图像-文本对)预训练通用编码器。CL容易受到基于数据投毒的后门攻击(DPBA),攻击者通过向预训练数据集中注入中毒输入,从而使编码器被植入后门。然而,现有DPBA的有效性有限。本研究首次分析现有后门攻击的局限性,并提出针对CL的新型DPBA——CorruptEncoder。CorruptEncoder引入一种新的攻击策略来生成中毒输入,并采用理论指导的方法最大化攻击效果。实验表明,CorruptEncoder显著优于现有DPBA。特别地,CorruptEncoder是首个仅需少量(3张)参考图像和0.5%低投毒率即可实现超过90%攻击成功率的DPBA。此外,我们还提出一种名为局部裁剪的防御方法来抵御DPBA。结果表明,该防御能降低DPBA的有效性,但以牺牲编码器效用为代价,凸显了开发新型防御方法的必要性。