The generalization bound is a crucial theoretical tool for assessing the generalizability of learning methods and there exist vast literatures on generalizability of normal learning, adversarial learning, and data poisoning. Unlike other data poison attacks, the backdoor attack has the special property that the poisoned triggers are contained in both the training set and the test set and the purpose of the attack is two-fold. To our knowledge, the generalization bound for the backdoor attack has not been established. In this paper, we fill this gap by deriving algorithm-independent generalization bounds in the clean-label backdoor attack scenario. Precisely, based on the goals of backdoor attack, we give upper bounds for the clean sample population errors and the poison population errors in terms of the empirical error on the poisoned training dataset. Furthermore, based on the theoretical result, a new clean-label backdoor attack is proposed that computes the poisoning trigger by combining adversarial noise and indiscriminate poison. We show its effectiveness in a variety of settings.
翻译:泛化界是评估学习方法泛化能力的关键理论工具,现有大量文献针对常规学习、对抗性学习和数据投毒的泛化性进行了研究。与其他数据投毒攻击不同,后门攻击具有特殊性质:投毒触发器同时存在于训练集和测试集中,且攻击目的具有双重性。据我们所知,后门攻击的泛化界尚未建立。本文通过推导干净标签后门攻击场景中与算法无关的泛化界填补了这一空白。具体而言,基于后门攻击的目标,我们利用投毒训练数据集上的经验误差,给出了干净样本总体误差与投毒总体误差的上界。此外,基于该理论结果,我们提出了一种新的干净标签后门攻击方法,通过结合对抗性噪声与无差别投毒来计算投毒触发器。我们在多种实验场景中验证了该方法的有效性。