Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks, posing concerning threats to their reliable deployment. Recent research reveals that backdoors can be erased from infected DNNs by pruning a specific group of neurons, while how to effectively identify and remove these backdoor-associated neurons remains an open challenge. Most of the existing defense methods rely on defined rules and focus on neuron's local properties, ignoring the exploration and optimization of pruning policies. To address this gap, we propose an Optimized Neuron Pruning (ONP) method combined with Graph Neural Network (GNN) and Reinforcement Learning (RL) to repair backdoor models. Specifically, ONP first models the target DNN as graphs based on neuron connectivity, and then uses GNN-based RL agents to learn graph embeddings and find a suitable pruning policy. To the best of our knowledge, this is the first attempt to employ GNN and RL for optimizing pruning policies in the field of backdoor defense. Experiments show, with a small amount of clean data, ONP can effectively prune the backdoor neurons implanted by a set of backdoor attacks at the cost of negligible performance degradation, achieving a new state-of-the-art performance for backdoor mitigation.
翻译:深度神经网络(DNN)已知易受后门攻击,对其可靠部署构成严重威胁。近期研究表明,通过剪枝特定神经元组可从受感染的DNN中消除后门,然而如何有效识别并移除这些后门关联神经元仍是开放难题。现有防御方法大多依赖预设规则,聚焦神经元的局部特性,忽视了对剪枝策略的探索与优化。为填补这一空白,我们提出结合图神经网络(GNN)与强化学习(RL)的优化神经元剪枝(ONP)方法以修复后门模型。具体而言,ONP首先基于神经元连接性将目标DNN建模为图结构,随后采用基于GNN的RL智能体学习图嵌入并寻找最优剪枝策略。据我们所知,这是在后门防御领域首次运用GNN与RL优化剪枝策略的尝试。实验表明,仅需少量干净数据,ONP便能以可忽略的性能损失为代价,有效剪除由系列后门攻击植入的后门神经元,在后门缓解任务中实现了新的最优性能。