Deep neural networks (DNNs) are vulnerable to backdoor attack, which does not affect the network's performance on clean data but would manipulate the network behavior once a trigger pattern is added. Existing defense methods have greatly reduced attack success rate, but their prediction accuracy on clean data still lags behind a clean model by a large margin. Inspired by the stealthiness and effectiveness of backdoor attack, we propose a simple but highly effective defense framework which injects non-adversarial backdoors targeting poisoned samples. Following the general steps in backdoor attack, we detect a small set of suspected samples and then apply a poisoning strategy to them. The non-adversarial backdoor, once triggered, suppresses the attacker's backdoor on poisoned data, but has limited influence on clean data. The defense can be carried out during data preprocessing, without any modification to the standard end-to-end training pipeline. We conduct extensive experiments on multiple benchmarks with different architectures and representative attacks. Results demonstrate that our method achieves state-of-the-art defense effectiveness with by far the lowest performance drop on clean data. Considering the surprising defense ability displayed by our framework, we call for more attention to utilizing backdoor for backdoor defense. Code is available at https://github.com/damianliumin/non-adversarial_backdoor.
翻译:深度神经网络易受后门攻击,此类攻击不影响网络在干净数据上的性能,但一旦添加触发模式,就会操控网络行为。现有防御方法虽大幅降低了攻击成功率,但其在干净数据上的预测精度仍远落后于干净模型。受后门攻击的隐蔽性与有效性启发,我们提出一种简单而高效的防御框架,通过注入针对中毒样本的非对抗后门。遵循后门攻击的一般步骤,我们检测一小批可疑样本,然后对其应用投毒策略。非对抗后门一旦触发,会抑制攻击者植入中毒数据的后门,但对干净数据的影响有限。该防御可在数据预处理阶段实施,无需修改标准端到端训练流程。我们在多个基准数据集上进行了广泛实验,采用不同架构和代表性攻击方法。结果表明,我们的方法在实现最先进防御效果的同时,对干净数据性能的下降幅度迄今最低。鉴于本框架展现出的惊人防御能力,我们呼吁更多关注利用后门进行后门防御。代码见 https://github.com/damianliumin/non-adversarial_backdoor。