Named Entity Recognition (NER) is a cornerstone NLP task while its robustness has been given little attention. This paper rethinks the principles of NER attacks derived from sentence classification, as they can easily violate the label consistency between the original and adversarial NER examples. This is due to the fine-grained nature of NER, as even minor word changes in the sentence can result in the emergence or mutation of any entities, resulting in invalid adversarial examples. To this end, we propose a novel one-word modification NER attack based on a key insight, NER models are always vulnerable to the boundary position of an entity to make their decision. We thus strategically insert a new boundary into the sentence and trigger the Entity Boundary Interference that the victim model makes the wrong prediction either on this boundary word or on other words in the sentence. We call this attack Virtual Boundary Attack (ViBA), which is shown to be remarkably effective when attacking both English and Chinese models with a 70%-90% attack success rate on state-of-the-art language models (e.g. RoBERTa, DeBERTa) and also significantly faster than previous methods.
翻译:命名实体识别(NER)是一项基础性的自然语言处理任务,但其鲁棒性却鲜有关注。本文重新审视了源自句子分类的NER攻击原则,因为这些攻击容易违反原始样本与对抗样本之间的标签一致性。这是由于NER的细粒度特性所致:即使句子中微小的词汇变化也可能导致任意实体的出现或变异,从而产生无效的对抗样本。为此,我们基于一个关键洞察——NER模型在做出决策时总是容易受到实体边界位置的影响——提出了一种新颖的单词修改NER攻击方法。我们策略性地在句子中插入一个新的边界,触发实体边界干扰,使受害模型对该边界词或句子中的其他词做出错误预测。我们将此攻击称为虚拟边界攻击(Virtual Boundary Attack,ViBA),实验证明该方法在攻击英文和中文模型时效果显著,对先进语言模型(如RoBERTa、DeBERTa)的攻击成功率可达70%-90%,并且速度远超先前方法。