Deep neural networks (DNNs) are vulnerable to backdoor attack, which does not affect the network's performance on clean data but would manipulate the network behavior once a trigger pattern is added. Existing defense methods have greatly reduced attack success rate, but their prediction accuracy on clean data still lags behind a clean model by a large margin. Inspired by the stealthiness and effectiveness of backdoor attack, we propose a simple but highly effective defense framework which injects non-adversarial backdoors targeting poisoned samples. Following the general steps in backdoor attack, we detect a small set of suspected samples and then apply a poisoning strategy to them. The non-adversarial backdoor, once triggered, suppresses the attacker's backdoor on poisoned data, but has limited influence on clean data. The defense can be carried out during data preprocessing, without any modification to the standard end-to-end training pipeline. We conduct extensive experiments on multiple benchmarks with different architectures and representative attacks. Results demonstrate that our method achieves state-of-the-art defense effectiveness with by far the lowest performance drop on clean data. Considering the surprising defense ability displayed by our framework, we call for more attention to utilizing backdoor for backdoor defense. Code is available at https://github.com/damianliumin/non-adversarial_backdoor.
翻译:深度神经网络(DNNs)易受后门攻击影响,这类攻击虽不影响网络对干净数据的性能,但一旦添加触发模式便会操纵网络行为。现有防御方法虽大幅降低了攻击成功率,但其在干净数据上的预测精度仍显著落后于干净模型。受后门攻击隐蔽性与高效性的启发,我们提出一种简单而高效的防御框架,该框架针对被污染样本注入非对抗性后门。遵循后门攻击的标准步骤,我们首先检测少量可疑样本,随后对其施加投毒策略。这种非对抗性后门一旦被触发,会压制攻击者在污染数据上的后门效果,同时仅对干净数据产生有限影响。该防御可在数据预处理阶段实施,无需对标准的端到端训练流程进行任何修改。我们在多个涵盖不同架构与代表性攻击的基准测试上开展了广泛实验。结果表明,我们的方法在实现当前最优防御效果的同时,对干净数据带来的性能下降幅度创下新低。鉴于本框架展现出的惊人防御能力,我们呼吁更多研究关注"以毒攻毒"利用后门进行防御的思路。相关代码已开源至 https://github.com/damianliumin/non-adversarial_backdoor。