The use of third-party datasets and pre-trained machine learning models poses a threat to NLP systems due to possibility of hidden backdoor attacks. Existing attacks involve poisoning the data samples such as insertion of tokens or sentence paraphrasing, which either alter the semantics of the original texts or can be detected. Our main difference from the previous work is that we use the reposition of a two words in a sentence as a trigger. By designing and applying specific part-of-speech (POS) based rules for selecting these tokens, we maintain high attack success rate on SST-2 and AG classification datasets while outperforming existing attacks in terms of perplexity and semantic similarity to the clean samples. In addition, we show the robustness of our attack to the ONION defense method. All the code and data for the paper can be obtained at https://github.com/alekseevskaia/OrderBkd.
翻译:第三方数据集和预训练机器学习模型的使用,因隐藏后门攻击的可能性,对自然语言处理系统构成威胁。现有攻击涉及污染数据样本(如插入令牌或句子改写),这些方法要么改变原始文本语义,要么易被检测。我们与先前工作的主要区别在于,将句子中两个词汇的重新排列作为触发器。通过设计并应用基于特定词性(POS)的规则来选择这些标记,我们在SST-2和AG分类数据集上保持高攻击成功率,同时在困惑度和与干净样本的语义相似性上优于现有攻击。此外,我们展示了本攻击对ONION防御方法的鲁棒性。论文所有代码和数据可通过https://github.com/alekseevskaia/OrderBkd获取。