Recent studies have shown that contrastive learning, like supervised learning, is highly vulnerable to backdoor attacks wherein malicious functions are injected into target models, only to be activated by specific triggers. However, thus far it remains under-explored how contrastive backdoor attacks fundamentally differ from their supervised counterparts, which impedes the development of effective defenses against the emerging threat. This work represents a solid step toward answering this critical question. Specifically, we define TRL, a unified framework that encompasses both supervised and contrastive backdoor attacks. Through the lens of TRL, we uncover that the two types of attacks operate through distinctive mechanisms: in supervised attacks, the learning of benign and backdoor tasks tends to occur independently, while in contrastive attacks, the two tasks are deeply intertwined both in their representations and throughout their learning processes. This distinction leads to the disparate learning dynamics and feature distributions of supervised and contrastive attacks. More importantly, we reveal that the specificities of contrastive backdoor attacks entail important implications from a defense perspective: existing defenses for supervised attacks are often inadequate and not easily retrofitted to contrastive attacks. We also explore several alternative defenses and discuss their potential challenges. Our findings highlight the need for defenses tailored to the specificities of contrastive backdoor attacks, pointing to promising directions for future research.
翻译:近期研究表明,对比学习与监督学习类似,极易受到后门攻击——攻击者将恶意功能注入目标模型,仅在特定触发器激活时生效。然而,目前学界对对比后门攻击与监督后门攻击的根本差异仍缺乏深入探究,这阻碍了针对这一新兴威胁的有效防御方法的发展。本研究旨在系统回答这一关键问题。具体而言,我们定义了TRL统一框架,该框架同时涵盖监督与对比后门攻击。通过TRL视角,我们发现两类攻击具有截然不同的运作机制:在监督攻击中,良性任务与后门任务的学习倾向于独立进行;而在对比攻击中,两个任务在表征与学习过程中深度交织。这种差异导致监督攻击与对比攻击呈现迥异的学习动力学特征与特征分布。更重要的是,我们揭示对比后门攻击的特异性对防御方法设计具有重要启示:现有针对监督攻击的防御措施往往难以有效迁移至对比攻击场景。本文还探索了多种替代性防御策略,并讨论其潜在挑战。研究结果强调亟需开发针对对比后门攻击特异性设计的防御方案,为未来研究指明了重要方向。