Recently, deep neural networks have been shown to be vulnerable to backdoor attacks. A backdoor is inserted into neural networks via this attack paradigm, thus compromising the integrity of the network. As soon as an attacker presents a trigger during the testing phase, the backdoor in the model is activated, allowing the network to make specific wrong predictions. It is extremely important to defend against backdoor attacks since they are very stealthy and dangerous. In this paper, we propose a novel defense mechanism, Neural Behavioral Alignment (NBA), for backdoor removal. NBA optimizes the distillation process in terms of knowledge form and distillation samples to improve defense performance according to the characteristics of backdoor defense. NBA builds high-level representations of neural behavior within networks in order to facilitate the transfer of knowledge. Additionally, NBA crafts pseudo samples to induce student models exhibit backdoor neural behavior. By aligning the backdoor neural behavior from the student network with the benign neural behavior from the teacher network, NBA enables the proactive removal of backdoors. Extensive experiments show that NBA can effectively defend against six different backdoor attacks and outperform five state-of-the-art defenses.
翻译:近年来,深度神经网络已被证明易受后门攻击。通过这种攻击范式,后门被植入神经网络,从而损害网络的完整性。一旦攻击者在测试阶段呈现触发器,模型中的后门即被激活,导致网络做出特定的错误预测。由于后门攻击具有极强的隐蔽性和危害性,对其进行防御至关重要。本文提出一种新颖的后门移除防御机制——神经行为对齐(NBA)。NBA根据后门防御的特点,从知识形式和蒸馏样本两个维度优化蒸馏过程以提升防御性能。NBA构建网络内部神经行为的高级表征,以促进知识迁移。此外,NBA通过构造伪样本诱导学生模型展现后门神经行为。通过将学生网络的后门神经行为与教师网络的良性神经行为进行对齐,NBA能够主动消除后门。大量实验表明,NBA能有效防御六种不同的后门攻击,其性能优于五种最先进的防御方法。