Beating Backdoor Attack at Its Own Game

Deep neural networks (DNNs) are vulnerable to backdoor attack, which does not affect the network's performance on clean data but would manipulate the network behavior once a trigger pattern is added. Existing defense methods have greatly reduced attack success rate, but their prediction accuracy on clean data still lags behind a clean model by a large margin. Inspired by the stealthiness and effectiveness of backdoor attack, we propose a simple but highly effective defense framework which injects non-adversarial backdoors targeting poisoned samples. Following the general steps in backdoor attack, we detect a small set of suspected samples and then apply a poisoning strategy to them. The non-adversarial backdoor, once triggered, suppresses the attacker's backdoor on poisoned data, but has limited influence on clean data. The defense can be carried out during data preprocessing, without any modification to the standard end-to-end training pipeline. We conduct extensive experiments on multiple benchmarks with different architectures and representative attacks. Results demonstrate that our method achieves state-of-the-art defense effectiveness with by far the lowest performance drop on clean data. Considering the surprising defense ability displayed by our framework, we call for more attention to utilizing backdoor for backdoor defense. Code is available at https://github.com/damianliumin/non-adversarial_backdoor.

翻译：深度神经网络容易遭受后门攻击——此类攻击不影响网络在干净数据上的性能，但一旦加入触发模式便会操纵网络行为。现有防御方法虽大幅降低了攻击成功率，但其在干净数据上的预测准确率仍显著落后于干净模型。受后门攻击隐蔽性与有效性的启发，我们提出一种简单却高效的防御框架，通过向中毒样本注入非对抗性后门实现防御。遵循后门攻击的通用步骤，我们先检测少量可疑样本，再对其施加投毒策略。这类非对抗性后门一旦被触发，会抑制攻击者在中毒数据上植入的后门效果，但对干净数据仅产生有限影响。该防御可在数据预处理阶段实施，无需修改标准的端到端训练流程。我们在多个基准数据集上使用不同网络架构与代表性攻击方法进行了广泛实验。结果表明，我们的方法在实现最先进防御效果的同时，对干净数据的性能影响降至最低。鉴于该框架展现出的惊人防御能力，我们呼吁学界更多关注"以毒攻毒"的后门防御思路。代码已开源：https://github.com/damianliumin/non-adversarial_backdoor。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日