Contrastive Learning (CL) has attracted enormous attention due to its remarkable capability in unsupervised representation learning. However, recent works have revealed the vulnerability of CL to backdoor attacks: the feature extractor could be misled to embed backdoored data close to an attack target class, thus fooling the downstream predictor to misclassify it as the target. Existing attacks usually adopt a fixed trigger pattern and poison the training set with trigger-injected data, hoping for the feature extractor to learn the association between trigger and target class. However, we find that such fixed trigger design fails to effectively associate trigger-injected data with target class in the embedding space due to special CL mechanisms, leading to a limited attack success rate (ASR). This phenomenon motivates us to find a better backdoor trigger design tailored for CL framework. In this paper, we propose a bi-level optimization approach to achieve this goal, where the inner optimization simulates the CL dynamics of a surrogate victim, and the outer optimization enforces the backdoor trigger to stay close to the target throughout the surrogate CL procedure. Extensive experiments show that our attack can achieve a higher attack success rate (e.g., $99\%$ ASR on ImageNet-100) with a very low poisoning rate ($1\%$). Besides, our attack can effectively evade existing state-of-the-art defenses. Code is available at: https://github.com/SWY666/SSL-backdoor-BLTO.
翻译:对比学习(CL)因其在无监督表示学习中的卓越能力而备受关注。然而,近期研究表明CL易受后门攻击:特征提取器可能被误导,将嵌入后门数据的样本靠近攻击目标类别,从而欺骗下游预测器将其误分类为目标类别。现有攻击通常采用固定触发器模式,并用注入触发器的数据污染训练集,期望特征提取器学习触发器与目标类别之间的关联。然而,我们发现由于CL的特殊机制,这种固定触发器设计无法在嵌入空间中有效关联注入触发器的数据与目标类别,导致攻击成功率(ASR)有限。这一现象促使我们为CL框架设计更优的后门触发器。本文提出一种双层优化方法实现此目标:内层优化模拟替代受害者的CL动态,外层优化强制后门触发器在整个替代CL过程中保持与目标的邻近性。大量实验表明,我们的攻击能以极低的污染率(1%)实现更高的攻击成功率(例如在ImageNet-100上达到99%的ASR)。此外,该攻击能有效规避现有最先进的防御方法。代码发布在:https://github.com/SWY666/SSL-backdoor-BLTO。