Contrastive learning (CL) pre-trains general-purpose encoders using an unlabeled pre-training dataset, which consists of images or image-text pairs. CL is vulnerable to data poisoning based backdoor attacks (DPBAs), in which an attacker injects poisoned inputs into the pre-training dataset so the encoder is backdoored. However, existing DPBAs achieve limited effectiveness. In this work, we propose new DPBAs called CorruptEncoder to CL. CorruptEncoder uses a theory-guided method to create optimal poisoned inputs to maximize attack effectiveness. Our experiments show that CorruptEncoder substantially outperforms existing DPBAs. In particular, CorruptEncoder is the first DPBA that achieves more than 90% attack success rates with only a few (3) reference images and a small poisoning ratio (0.5%). Moreover, we also propose a defense, called localized cropping, to defend against DPBAs. Our results show that our defense can reduce the effectiveness of DPBAs, though it slightly sacrifices the utility of the encoder.
翻译:对比学习(CL)利用无标签预训练数据集(包含图像或图像-文本对)预训练通用编码器。对比学习容易受到基于数据投毒的后门攻击(DPBAs),攻击者通过向预训练数据集中注入中毒输入,从而使编码器被植入后门。然而,现有DPBAs的有效性有限。在本工作中,我们提出针对CL的新型DPBA——CorruptEncoder。CorruptEncoder采用理论引导的方法创建最优中毒输入,以最大化攻击效果。实验表明,CorruptEncoder的性能显著优于现有DPBAs。特别地,CorruptEncoder是首个仅需少量(3张)参考图像和低投毒率(0.5%)即可实现超过90%攻击成功率的DPBA。此外,我们还提出一种名为局部裁剪的防御机制来抵御DPBAs。结果表明,该防御虽会略微牺牲编码器的效用,但能降低DPBAs的攻击有效性。