Deep neural networks have played a crucial part in many critical domains, such as autonomous driving, face recognition, and medical diagnosis. However, deep neural networks are facing security threats from backdoor attacks and can be manipulated into attacker-decided behaviors by the backdoor attacker. To defend the backdoor, prior research has focused on using clean data to remove backdoor attacks before model deployment. In this paper, we investigate the possibility of defending against backdoor attacks at test time by utilizing partially poisoned data to remove the backdoor from the model. To address the problem, a two-stage method Test-Time Backdoor Defense (TTBD) is proposed. In the first stage, we propose two backdoor sample detection methods, namely DDP and TeCo, to identify poisoned samples from a batch of mixed, partially poisoned samples. Once the poisoned samples are detected, we employ Shapley estimation to calculate the contribution of each neuron's significance in the network, locate the poisoned neurons, and prune them to remove backdoor in the models. Our experiments demonstrate that TTBD removes the backdoor successfully with only a batch of partially poisoned data across different model architectures and datasets against different types of backdoor attacks.
翻译:深度神经网络已在自动驾驶、人脸识别和医学诊断等众多关键领域发挥了重要作用。然而,深度神经网络面临着后门攻击的安全威胁,后门攻击者可能操控模型执行其预设的行为。为防御后门攻击,先前的研究主要集中于在模型部署前利用干净数据消除后门。本文探讨了在测试阶段利用部分中毒数据移除模型后门的可行性。针对该问题,我们提出了一种两阶段方法——测试时后门防御(TTBD)。第一阶段,我们设计了两种后门样本检测方法,即DDP和TeCo,用于从混合的部分中毒样本批次中识别中毒样本。检测到中毒样本后,我们采用Shapley估计计算网络中每个神经元重要性的贡献度,定位中毒神经元并对其进行剪枝,从而移除模型中的后门。实验结果表明,在不同模型架构、数据集及多种后门攻击类型下,TTBD仅需使用部分中毒数据批次即可成功消除后门。