Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks, posing concerning threats to their reliable deployment. Recent research reveals that backdoors can be erased from infected DNNs by pruning a specific group of neurons, while how to effectively identify and remove these backdoor-associated neurons remains an open challenge. In this paper, we investigate the correlation between backdoor behavior and neuron magnitude, and find that backdoor neurons deviate from the magnitude-saliency correlation of the model. The deviation inspires us to propose a Magnitude-based Neuron Pruning (MNP) method to detect and prune backdoor neurons. Specifically, MNP uses three magnitude-guided objective functions to manipulate the magnitude-saliency correlation of backdoor neurons, thus achieving the purpose of exposing backdoor behavior, eliminating backdoor neurons and preserving clean neurons, respectively. Experiments show our pruning strategy achieves state-of-the-art backdoor defense performance against a variety of backdoor attacks with a limited amount of clean data, demonstrating the crucial role of magnitude for guiding backdoor defenses.
翻译:深度神经网络(DNNs)已知易受后门攻击,对其可靠部署构成了严重威胁。近期研究表明,通过剪枝特定神经元组可以从受感染的DNN中消除后门,然而如何有效识别并移除这些与后门相关的神经元仍是一个开放挑战。本文研究了后门行为与神经元幅度之间的相关性,发现后门神经元偏离了模型的幅度-显著性关联。这一偏差启发我们提出一种基于幅度的神经元剪枝(MNP)方法,以检测并剪枝后门神经元。具体而言,MNP使用三个幅度引导的目标函数来操纵后门神经元的幅度-显著性关联,从而分别实现暴露后门行为、消除后门神经元以及保留干净神经元的目的。实验表明,我们的剪枝策略在有限干净数据下,针对多种后门攻击实现了最先进的后门防御性能,证明了幅度对于指导后门防御的关键作用。