The success of deep neural networks (DNNs) in real-world applications has benefited from abundant pre-trained models. However, the backdoored pre-trained models can pose a significant trojan threat to the deployment of downstream DNNs. Existing DNN testing methods are mainly designed to find incorrect corner case behaviors in adversarial settings but fail to discover the backdoors crafted by strong trojan attacks. Observing the trojan network behaviors shows that they are not just reflected by a single compromised neuron as proposed by previous work but attributed to the critical neural paths in the activation intensity and frequency of multiple neurons. This work formulates the DNN backdoor testing and proposes the CatchBackdoor framework. Via differential fuzzing of critical neurons from a small number of benign examples, we identify the trojan paths and particularly the critical ones, and generate backdoor testing examples by simulating the critical neurons in the identified paths. Extensive experiments demonstrate the superiority of CatchBackdoor, with higher detection performance than existing methods. CatchBackdoor works better on detecting backdoors by stealthy blending and adaptive attacks, which existing methods fail to detect. Moreover, our experiments show that CatchBackdoor may reveal the potential backdoors of models in Model Zoo.
翻译:深度神经网络在实际应用中的成功得益于大量预训练模型。然而,带有后门的预训练模型可能对下游深度神经网络的部署构成严重的特洛伊木马威胁。现有深度神经网络测试方法主要设计用于在对抗性场景中发现错误的边界案例行为,但未能发现由强特洛伊攻击构建的后门。观察特洛伊网络行为表明,这些行为不仅体现为先前工作中提出的单个被攻陷神经元,更归因于多个神经元在激活强度与频率上的关键神经通路。本文形式化定义了深度神经网络后门测试问题,并提出CatchBackdoor框架。通过从少量良性样本中对关键神经元进行差异性模糊测试,我们识别出特洛伊通路(尤其是关键通路),并通过模拟已识别通路中的关键神经元生成后门测试样本。大量实验表明,CatchBackdoor具有优越性,相比现有方法实现了更高的检测性能。CatchBackdoor在检测隐身混合攻击和自适应攻击(现有方法无法检测)所植入的后门方面表现更优。此外,我们的实验表明,CatchBackdoor可能揭示Model Zoo中模型的潜在后门。