Backdoor attacks present a substantial security concern for deep learning models, especially those utilized in applications critical to safety and security. These attacks manipulate model behavior by embedding a hidden trigger during the training phase, allowing unauthorized control over the model's output during inference time. Although numerous defenses exist for image classification models, there is a conspicuous absence of defenses tailored for time series data, as well as an end-to-end solution capable of training clean models on poisoned data. To address this gap, this paper builds upon Anti-Backdoor Learning (ABL) and introduces an innovative method, End-to-End Anti-Backdoor Learning (E2ABL), for robust training against backdoor attacks. Unlike the original ABL, which employs a two-stage training procedure, E2ABL accomplishes end-to-end training through an additional classification head linked to the shallow layers of a Deep Neural Network (DNN). This secondary head actively identifies potential backdoor triggers, allowing the model to dynamically cleanse these samples and their corresponding labels during training. Our experiments reveal that E2ABL significantly improves on existing defenses and is effective against a broad range of backdoor attacks in both image and time series domains.
翻译:后门攻击对深度学习模型构成重大安全威胁,尤其针对那些应用于安全关键领域的模型。此类攻击通过在训练阶段嵌入隐藏触发器来操控模型行为,使攻击者能在推理阶段未经授权地控制模型输出。尽管图像分类模型已有诸多防御方法,但针对时间序列数据的防御措施明显缺失,且缺乏能在被污染数据上训练出干净模型的端到端解决方案。为填补这一空白,本文基于反后门学习(ABL)框架提出创新方法——端到端反后门学习(E2ABL),以实现对后门攻击的鲁棒训练。与采用两阶段训练流程的原始ABL不同,E2ABL通过附加连接至深度神经网络(DNN)浅层的分类头实现端到端训练。该辅助分类头能主动识别潜在后门触发器,使模型在训练过程中动态清除这些样本及其对应标签。实验表明,E2ABL显著优于现有防御方法,能有效抵御图像与时间序列领域的广泛后门攻击。