Within the realm of computer vision, self-supervised learning (SSL) pertains to training pre-trained image encoders utilizing a substantial quantity of unlabeled images. Pre-trained image encoders can serve as feature extractors, facilitating the construction of downstream classifiers for various tasks. However, the use of SSL has led to an increase in security research related to various backdoor attacks. Currently, the trigger patterns used in backdoor attacks on SSL are mostly visible or static (sample-agnostic), making backdoors less covert and significantly affecting the attack performance. In this work, we propose GhostEncoder, the first dynamic invisible backdoor attack on SSL. Unlike existing backdoor attacks on SSL, which use visible or static trigger patterns, GhostEncoder utilizes image steganography techniques to encode hidden information into benign images and generate backdoor samples. We then fine-tune the pre-trained image encoder on a manipulation dataset to inject the backdoor, enabling downstream classifiers built upon the backdoored encoder to inherit the backdoor behavior for target downstream tasks. We evaluate GhostEncoder on three downstream tasks and results demonstrate that GhostEncoder provides practical stealthiness on images and deceives the victim model with a high attack success rate without compromising its utility. Furthermore, GhostEncoder withstands state-of-the-art defenses, including STRIP, STRIP-Cl, and SSL-Cleanse.
翻译:在计算机视觉领域,自监督学习(SSL)通过利用大量无标记图像训练预训练图像编码器。预训练图像编码器可作为特征提取器,为各类任务的下游分类器构建提供支持。然而,SSL的应用导致与各类后门攻击相关的安全研究日益增多。当前SSL后门攻击中使用的触发器模式大多为可见或静态(样本无关)模式,导致后门隐蔽性不足,并显著影响攻击性能。本文提出GhostEncoder——首个针对SSL的动态不可见后门攻击方法。与现有使用可见或静态触发器模式的SSL后门攻击不同,GhostEncoder利用图像隐写技术将隐藏信息编码至良性图像中,生成后门样本。随后,我们在操控数据集上微调预训练图像编码器以注入后门,使基于后门编码器构建的下游分类器在目标下游任务中继承后门行为。我们在三个下游任务上评估GhostEncoder,结果表明该方法在保持图像实用性的同时,具备实际隐蔽性并以高攻击成功率欺骗受害者模型。此外,GhostEncoder能够抵御包括STRIP、STRIP-Cl和SSL-Cleanse在内的先进防御机制。