Backdoor attack aims to deceive a victim model when facing backdoor instances while maintaining its performance on benign data. Current methods use manual patterns or special perturbations as triggers, while they often overlook the robustness against data corruption, making backdoor attacks easy to defend in practice. To address this issue, we propose a novel backdoor attack method named Spy-Watermark, which remains effective when facing data collapse and backdoor defense. Therein, we introduce a learnable watermark embedded in the latent domain of images, serving as the trigger. Then, we search for a watermark that can withstand collapse during image decoding, cooperating with several anti-collapse operations to further enhance the resilience of our trigger against data corruption. Extensive experiments are conducted on CIFAR10, GTSRB, and ImageNet datasets, demonstrating that Spy-Watermark overtakes ten state-of-the-art methods in terms of robustness and stealthiness.
翻译:后门攻击旨在欺骗受害模型在后门实例面前的表现,同时保持其对良性数据的性能。现有方法通常使用人工图案或特殊扰动作为触发器,但往往忽视了针对数据损坏的鲁棒性,这使得后门攻击在实践中易于防御。为解决该问题,我们提出一种名为Spy-Watermark的新型后门攻击方法,该方法在面对数据崩溃和后门防御时依然有效。其中,我们引入一种嵌入图像潜在域的可学习水印作为触发器。随后,我们搜索一种能够承受图像解码过程中崩溃的水印,并结合多种抗崩溃操作进一步增强触发器对数据损坏的鲁棒性。在CIFAR10、GTSRB和ImageNet数据集上进行的广泛实验表明,Spy-Watermark在鲁棒性和隐蔽性方面超越了十种最先进的方法。