As a new paradigm in machine learning, self-supervised learning (SSL) is capable of learning high-quality representations of complex data without relying on labels. In addition to eliminating the need for labeled data, research has found that SSL improves the adversarial robustness over supervised learning since lacking labels makes it more challenging for adversaries to manipulate model predictions. However, the extent to which this robustness superiority generalizes to other types of attacks remains an open question. We explore this question in the context of backdoor attacks. Specifically, we design and evaluate CTRL, an embarrassingly simple yet highly effective self-supervised backdoor attack. By only polluting a tiny fraction of training data (<= 1%) with indistinguishable poisoning samples, CTRL causes any trigger-embedded input to be misclassified to the adversary's designated class with a high probability (>= 99%) at inference time. Our findings suggest that SSL and supervised learning are comparably vulnerable to backdoor attacks. More importantly, through the lens of CTRL, we study the inherent vulnerability of SSL to backdoor attacks. With both empirical and analytical evidence, we reveal that the representation invariance property of SSL, which benefits adversarial robustness, may also be the very reason making \ssl highly susceptible to backdoor attacks. Our findings also imply that the existing defenses against supervised backdoor attacks are not easily retrofitted to the unique vulnerability of SSL.
翻译:作为机器学习的一种新范式,自监督学习(SSL)能够在无需依赖标签的情况下学习复杂数据的高质量表征。研究发现,除了消除对标注数据的依赖外,SSL还能提升对抗性鲁棒性,因为缺乏标签使得攻击者更难操纵模型预测。然而,这种鲁棒性优势在多大程度上适用于其他类型的攻击仍是一个开放性问题。我们在后门攻击的背景下探讨了这一问题。具体而言,我们设计并评估了CTRL——一种极其简单却高度有效的自监督后门攻击。通过仅污染极少数训练数据(≤1%),并注入难以区分的投毒样本,CTRL能够在推理时使任何嵌入触发器的输入以高概率(≥99%)被错误分类到攻击者指定的类别。我们的研究发现表明,SSL和监督学习对后门攻击的脆弱性相当。更重要的是,通过CTRL的视角,我们研究了SSL在后门攻击下的固有脆弱性。结合实证与分析证据,我们揭示了SSL的表征不变性——这一有利于对抗鲁棒性的特性——可能恰恰是导致SSL高度易受后门攻击的根本原因。我们的发现还表明,现有针对监督后门攻击的防御措施难以轻易适用于SSL的独特脆弱性。