At present, backdoor attacks attract attention as they do great harm to deep learning models. The adversary poisons the training data making the model being injected with a backdoor after being trained unconsciously by victims using the poisoned dataset. In the field of text, however, existing works do not provide sufficient defense against backdoor attacks. In this paper, we propose a Noise-augmented Contrastive Learning (NCL) framework to defend against textual backdoor attacks when training models with untrustworthy data. With the aim of mitigating the mapping between triggers and the target label, we add appropriate noise perturbing possible backdoor triggers, augment the training dataset, and then pull homology samples in the feature space utilizing contrastive learning objective. Experiments demonstrate the effectiveness of our method in defending three types of textual backdoor attacks, outperforming the prior works.
翻译:当前,后门攻击因对深度学习模型造成严重危害而备受关注。攻击者通过投毒训练数据,使受害者在不知情的情况下使用被污染的数据集训练模型,从而在模型中植入后门。然而,在文本领域,现有研究未能提供足够的后门攻击防御手段。本文提出了一种噪声增强对比学习(NCL)框架,用于在使用不可信数据训练模型时防御文本后门攻击。为削弱触发器与目标标签之间的映射关系,我们添加适当的噪声干扰潜在的后门触发器,扩充训练数据集,并利用对比学习目标在特征空间中拉近同源样本。实验表明,该方法在防御三种类型的文本后门攻击方面具有有效性,性能优于先前的工作。