Pre-trained large deep learning models are now serving as the dominant component for downstream middleware users and have revolutionized the learning paradigm, replacing the traditional approach of training from scratch locally. To reduce development costs, developers often integrate third-party pre-trained deep neural networks (DNNs) into their intelligent software systems. However, utilizing untrusted DNNs presents significant security risks, as these models may contain intentional backdoor defects resulting from the black-box training process. These backdoor defects can be activated by hidden triggers, allowing attackers to maliciously control the model and compromise the overall reliability of the intelligent software. To ensure the safe adoption of DNNs in critical software systems, it is crucial to establish a backdoor defect database for localization studies. This paper addresses this research gap by introducing BDefects4NN, the first backdoor defect database, which provides labeled backdoor-defected DNNs at the neuron granularity and enables controlled localization studies of defect root causes. In BDefects4NN, we define three defect injection rules and employ four representative backdoor attacks across four popular network architectures and three widely adopted datasets, yielding a comprehensive database of 1,654 backdoor-defected DNNs with four defect quantities and varying infected neurons. Based on BDefects4NN, we conduct extensive experiments on evaluating six fault localization criteria and two defect repair techniques, which show limited effectiveness for backdoor defects. Additionally, we investigate backdoor-defected models in practical scenarios, specifically in lane detection for autonomous driving and large language models (LLMs), revealing potential threats and highlighting current limitations in precise defect localization.
翻译:预训练大型深度学习模型现已成为下游中间件用户的主导组件,并彻底改变了学习范式,取代了传统的本地从零开始训练方法。为降低开发成本,开发者常将第三方预训练深度神经网络(DNN)集成至其智能软件系统中。然而,使用不可信的DNN会带来重大安全风险,因为这些模型可能因黑盒训练过程而包含蓄意植入的后门缺陷。此类后门缺陷可通过隐藏触发器激活,使攻击者能够恶意控制模型,从而危及智能软件的整体可靠性。为确保DNN在关键软件系统中的安全应用,建立用于定位研究的后门缺陷数据库至关重要。本文通过提出首个后门缺陷数据库BDefects4NN来填补这一研究空白,该数据库提供神经元粒度的带标签后门缺陷DNN,支持对缺陷根源进行可控定位研究。在BDefects4NN中,我们定义了三类缺陷注入规则,在四种主流网络架构和三个广泛采用的数据集上实施了四种代表性后门攻击,构建了包含1,654个具有四种缺陷数量及不同感染神经元的后门缺陷DNN的综合数据库。基于BDefects4NN,我们对六种故障定位准则和两种缺陷修复技术进行了广泛实验评估,结果表明现有方法对后门缺陷的修复效果有限。此外,我们探究了后门缺陷模型在自动驾驶车道检测和大语言模型(LLMs)等实际场景中的应用,揭示了潜在威胁,并凸显了当前精确缺陷定位技术的局限性。