Deep neural networks (DNNs) are vulnerable to backdoor attacks, where a backdoored model behaves normally with clean inputs but exhibits attacker-specified behaviors upon the inputs containing triggers. Most previous backdoor attacks mainly focus on either the all-to-one or all-to-all paradigm, allowing attackers to manipulate an input to attack a single target class. Besides, the two paradigms rely on a single trigger for backdoor activation, rendering attacks ineffective if the trigger is destroyed. In light of the above, we propose a new $M$-to-$N$ attack paradigm that allows an attacker to manipulate any input to attack $N$ target classes, and each backdoor of the $N$ target classes can be activated by any one of its $M$ triggers. Our attack selects $M$ clean images from each target class as triggers and leverages our proposed poisoned image generation framework to inject the triggers into clean images invisibly. By using triggers with the same distribution as clean training images, the targeted DNN models can generalize to the triggers during training, thereby enhancing the effectiveness of our attack on multiple target classes. Extensive experimental results demonstrate that our new backdoor attack is highly effective in attacking multiple target classes and robust against pre-processing operations and existing defenses.
翻译:深度神经网络(DNN)易受后门攻击,其中被植入后门的模型在处理干净输入时表现正常,但在输入包含触发器时会展现出攻击者指定的行为。以往的后门攻击主要集中于“全对一”或“全对全”范式,允许攻击者操纵输入以攻击单个目标类别。此外,这两种范式依赖单一触发器激活后门,若触发器被破坏则攻击失效。鉴于此,我们提出了一种新的$M$对$N$攻击范式,允许攻击者操纵任意输入攻击$N$个目标类别,且每个$N$目标类别的后门均可由其$M$个触发器中的任意一个激活。我们的攻击从每个目标类别中选择$M$张干净图像作为触发器,并利用我们提出的毒化图像生成框架将触发器不可见地注入干净图像中。通过使用与干净训练图像分布相同的触发器,目标DNN模型在训练过程中能够泛化到这些触发器,从而增强了对多个目标类别的攻击效果。大量实验结果表明,我们的新型后门攻击在攻击多个目标类别方面非常有效,并且对预处理操作和现有防御方法具有鲁棒性。