Deep neural networks (DNNs) are vulnerable to backdoor attack, which does not affect the network's performance on clean data but would manipulate the network behavior once a trigger pattern is added. Existing defense methods have greatly reduced attack success rate, but their prediction accuracy on clean data still lags behind a clean model by a large margin. Inspired by the stealthiness and effectiveness of backdoor attack, we propose a simple but highly effective defense framework which injects non-adversarial backdoors targeting poisoned samples. Following the general steps in backdoor attack, we detect a small set of suspected samples and then apply a poisoning strategy to them. The non-adversarial backdoor, once triggered, suppresses the attacker's backdoor on poisoned data, but has limited influence on clean data. The defense can be carried out during data preprocessing, without any modification to the standard end-to-end training pipeline. We conduct extensive experiments on multiple benchmarks with different architectures and representative attacks. Results demonstrate that our method achieves state-of-the-art defense effectiveness with by far the lowest performance drop on clean data. Considering the surprising defense ability displayed by our framework, we call for more attention to utilizing backdoor for backdoor defense. Code is available at https://github.com/damianliumin/non-adversarial_backdoor.
翻译:深度神经网络容易遭受后门攻击——此类攻击不影响网络在干净数据上的性能,但一旦加入触发模式便会操纵网络行为。现有防御方法虽大幅降低了攻击成功率,但其在干净数据上的预测准确率仍显著落后于干净模型。受后门攻击隐蔽性与有效性的启发,我们提出一种简单却高效的防御框架,通过向中毒样本注入非对抗性后门实现防御。遵循后门攻击的通用步骤,我们先检测少量可疑样本,再对其施加投毒策略。这类非对抗性后门一旦被触发,会抑制攻击者在中毒数据上植入的后门效果,但对干净数据仅产生有限影响。该防御可在数据预处理阶段实施,无需修改标准的端到端训练流程。我们在多个基准数据集上使用不同网络架构与代表性攻击方法进行了广泛实验。结果表明,我们的方法在实现最先进防御效果的同时,对干净数据的性能影响降至最低。鉴于该框架展现出的惊人防御能力,我们呼吁学界更多关注"以毒攻毒"的后门防御思路。代码已开源:https://github.com/damianliumin/non-adversarial_backdoor。