Neural networks are vulnerable to backdoor poisoning attacks, where the attackers maliciously poison the training set and insert triggers into the test input to change the prediction of the victim model. Existing defenses for backdoor attacks either provide no formal guarantees or come with expensive-to-compute and ineffective probabilistic guarantees. We present PECAN, an efficient and certified approach for defending against backdoor attacks. The key insight powering PECAN is to apply off-the-shelf test-time evasion certification techniques on a set of neural networks trained on disjoint partitions of the data. We evaluate PECAN on image classification and malware detection datasets. Our results demonstrate that PECAN can (1) significantly outperform the state-of-the-art certified backdoor defense, both in defense strength and efficiency, and (2) on real back-door attacks, PECAN can reduce attack success rate by order of magnitude when compared to a range of baselines from the literature.
翻译:神经网络易受后门投毒攻击,攻击者恶意污染训练集并向测试输入中插入触发器,以改变受害者模型的预测结果。现有针对后门攻击的防御方法要么不提供形式化保障,要么代价高昂且仅能提供无效的概率性保障。我们提出PECAN,一种高效且经过认证的后门攻击防御方法。PECAN的核心思想是:在基于数据不相交分区训练的多个神经网络上,直接应用现成的测试时逃避攻击认证技术。我们在图像分类和恶意软件检测数据集上评估了PECAN。实验结果表明,PECAN能够:(1)在防御强度和效率上均显著超越现有的认证后门防御方法;(2)在面对真实后门攻击时,与文献中一系列基线方法相比,可将攻击成功率降低一个数量级。