With the boom in the natural language processing (NLP) field these years, backdoor attacks pose immense threats against deep neural network models. However, previous works hardly consider the effect of the poisoning rate. In this paper, our main objective is to reduce the number of poisoned samples while still achieving a satisfactory Attack Success Rate (ASR) in text backdoor attacks. To accomplish this, we propose an efficient trigger word insertion strategy in terms of trigger word optimization and poisoned sample selection. Extensive experiments on different datasets and models demonstrate that our proposed method can significantly improve attack effectiveness in text classification tasks. Remarkably, our approach achieves an ASR of over 90% with only 10 poisoned samples in the dirty-label setting and requires merely 1.5% of the training data in the clean-label setting.
翻译:随着自然语言处理(NLP)领域近年来的蓬勃发展,后门攻击对深度神经网络模型构成了巨大威胁。然而,现有研究鲜少关注投毒率的影响。本文的主要目标是在减少投毒样本数量的同时,仍能在文本后门攻击中实现令人满意的攻击成功率(ASR)。为此,我们从触发词优化与投毒样本选择两个维度出发,提出了一种高效的触发词插入策略。在不同数据集与模型上的大量实验表明,该方法能显著提升文本分类任务中的攻击效能。值得注意的是,在脏标签设定下,仅需10个投毒样本即可使ASR超过90%;在干净标签设定下,仅需训练数据的1.5%即可达到同等效果。