Backdoor attack is a common threat to deep neural networks. During testing, samples embedded with a backdoor trigger will be misclassified as an adversarial target by a backdoored model, while samples without the backdoor trigger will be correctly classified. In this paper, we present the first certified backdoor detector (CBD), which is based on a novel, adjustable conformal prediction scheme based on our proposed statistic local dominant probability. For any classifier under inspection, CBD provides 1) a detection inference, 2) the condition under which the attacks are guaranteed to be detectable for the same classification domain, and 3) a probabilistic upper bound for the false positive rate. Our theoretical results show that attacks with triggers that are more resilient to test-time noise and have smaller perturbation magnitudes are more likely to be detected with guarantees. Moreover, we conduct extensive experiments on four benchmark datasets considering various backdoor types, such as BadNet, CB, and Blend. CBD achieves comparable or even higher detection accuracy than state-of-the-art detectors, and it in addition provides detection certification. Notably, for backdoor attacks with random perturbation triggers bounded by $\ell_2\leq0.75$ which achieves more than 90\% attack success rate, CBD achieves 100\% (98\%), 100\% (84\%), 98\% (98\%), and 72\% (40\%) empirical (certified) detection true positive rates on the four benchmark datasets GTSRB, SVHN, CIFAR-10, and TinyImageNet, respectively, with low false positive rates.
翻译:后门攻击是对深度神经网络的常见威胁。在测试阶段,嵌入了后门触发器的样本会被后门模型误分类为对抗目标,而未嵌入后门触发器的样本则能被正确分类。本文提出了首个可认证的后门检测器(CBD),该检测器基于一种新颖的、可调节的共形预测方案,该方案建立在所提出的统计量——局部主导概率之上。对于任何待检测的分类器,CBD提供:1) 检测推理结果;2) 在同一分类域中确保攻击可被检测的条件;3) 假阳性率的概率上界。理论结果表明,对于对测试时噪声更具鲁棒性且扰动幅度更小的触发器,其对应的攻击更有可能被保证检测到。此外,我们在四个基准数据集上进行了大量实验,考虑了多种后门类型(如BadNet、CB和Blend)。CBD取得了与现有最优检测器相当甚至更高的检测精度,同时额外提供了检测认证。值得注意的是,针对受$\ell_2\leq0.75$约束的随机扰动触发器(其攻击成功率超过90%),CBD在GTSRB、SVHN、CIFAR-10和TinyImageNet四个基准数据集上分别实现了100%(98%)、100%(84%)、98%(98%)和72%(40%)的经验(认证)检测真阳性率,且假阳性率较低。