The backdoor attack poses a new security threat to deep neural networks. Existing backdoor often relies on visible universal trigger to make the backdoored model malfunction, which are not only usually visually suspicious to human but also catchable by mainstream countermeasures. We propose an imperceptible sample-specific backdoor that the trigger varies from sample to sample and invisible. Our trigger generation is automated through a desnoising autoencoder that is fed with delicate but pervasive features (i.e., edge patterns per images). We extensively experiment our backdoor attack on ImageNet and MS-Celeb-1M, which demonstrates stable and nearly 100% (i.e., 99.8%) attack success rate with negligible impact on the clean data accuracy of the infected model. The denoising autoeconder based trigger generator is reusable or transferable across tasks (e.g., from ImageNet to MS-Celeb-1M), whilst the trigger has high exclusiveness (i.e., a trigger generated for one sample is not applicable to another sample). Besides, our proposed backdoored model has achieved high evasiveness against mainstream backdoor defenses such as Neural Cleanse, STRIP, SentiNet and Fine-Pruning.
翻译:后门攻击对深度神经网络构成了新的安全威胁。现有后门通常依赖可见的通用触发器使被植入后门的模型失效,这不仅在视觉上容易引起人类怀疑,也容易被主流防御机制检测。我们提出一种不可察觉的样本特定后门,其触发器随样本变化且不可见。我们的触发器生成通过去噪自编码器自动实现,该编码器输入的是精细但普遍存在的特征(即每张图像的边缘模式)。我们在ImageNet和MS-Celeb-1M数据集上进行了广泛的实验,结果表明该后门攻击具有稳定的接近100%(即99.8%)的攻击成功率,且对受感染模型在干净数据上的准确率影响可忽略不计。基于去噪自编码器的触发器生成器具有跨任务可重用性或可迁移性(例如从ImageNet到MS-Celeb-1M),同时触发器具有高度排他性(即针对某样本生成的触发器不适用于其他样本)。此外,我们提出的后门模型对Neural Cleanse、STRIP、SentiNet和Fine-Pruning等主流后门防御机制均表现出高规避性。