Deep neural networks have played a crucial part in many critical domains, such as autonomous driving, face recognition, and medical diagnosis. However, deep neural networks are facing security threats from backdoor attacks and can be manipulated into attacker-decided behaviors by the backdoor attacker. To defend the backdoor, prior research has focused on using clean data to remove backdoor attacks before model deployment. In this paper, we investigate the possibility of defending against backdoor attacks at test time by utilizing partially poisoned data to remove the backdoor from the model. To address the problem, a two-stage method Test-Time Backdoor Defense (TTBD) is proposed. In the first stage, we propose a backdoor sample detection method DDP to identify poisoned samples from a batch of mixed, partially poisoned samples. Once the poisoned samples are detected, we employ Shapley estimation to calculate the contribution of each neuron's significance in the network, locate the poisoned neurons, and prune them to remove backdoor in the models. Our experiments demonstrate that TTBD removes the backdoor successfully with only a batch of partially poisoned data across different model architectures and datasets against different types of backdoor attacks.
翻译:深度神经网络已在自动驾驶、人脸识别和医学诊断等多个关键领域发挥重要作用。然而,深度神经网络面临着来自后门攻击的安全威胁,攻击者可操控模型执行其所预期行为。为此,先前研究主要聚焦于在模型部署前使用干净数据移除后门攻击。本文探讨了在测试阶段利用部分被污染数据来消除模型后门攻击的可行性。针对该问题,我们提出了一种两阶段方法——测试时后门防御(TTBD)。第一阶段提出了一种后门样本检测方法DDP,用于从混合的部分被污染样本批次中识别出被污染样本。一旦检测到被污染样本,我们采用Shapley估计计算网络中各神经元重要性的贡献度,定位被污染神经元并对其进行剪枝以消除模型中的后门。实验表明,TTBD仅需使用一批部分被污染数据,即可在不同模型架构和数据集上成功消除针对不同类型后门攻击的后门威胁。