Web-scraped datasets are vulnerable to data poisoning, which can be used for backdooring deep image classifiers during training. Since training on large datasets is expensive, a model is trained once and re-used many times. Unlike adversarial examples, backdoor attacks often target specific classes rather than any class learned by the model. One might expect that targeting many classes through a naive composition of attacks vastly increases the number of poison samples. We show this is not necessarily true and more efficient, universal data poisoning attacks exist that allow controlling misclassifications from any source class into any target class with a small increase in poison samples. Our idea is to generate triggers with salient characteristics that the model can learn. The triggers we craft exploit a phenomenon we call inter-class poison transferability, where learning a trigger from one class makes the model more vulnerable to learning triggers for other classes. We demonstrate the effectiveness and robustness of our universal backdoor attacks by controlling models with up to 6,000 classes while poisoning only 0.15% of the training dataset. Our source code is available at https://github.com/Ben-Schneider-code/Universal-Backdoor-Attacks.
翻译:网络爬取的数据集易受数据投毒攻击,此类攻击可在训练阶段用于向深度图像分类器植入后门。由于大规模数据集训练成本高昂,模型通常仅训练一次便多次复用。与对抗样本不同,后门攻击往往针对特定类别而非模型学习的任意类别。人们可能认为通过朴素组合攻击来 Targeting 多个类别会大幅增加投毒样本数量。我们证明这种看法未必成立,存在更高效的通用数据投毒攻击——仅需少量增加投毒样本即可实现对从任意源类别到任意目标类别的分类错误控制。我们的核心思想是生成具备显著特征的可学习触发器。所构造的触发器利用了被称为类别间投毒迁移性的现象:从某个类别学习触发器会使模型更容易学习其他类别的触发器。我们通过控制包含多达6000个类别的模型(仅投毒训练数据集的0.15%)验证了通用后门攻击的有效性与鲁棒性。源代码已开源至 https://github.com/Ben-Schneider-code/Universal-Backdoor-Attacks。