Few-shot named entity recognition (NER) aims to recognize novel named entities in low-resource domains utilizing existing knowledge. However, the present few-shot NER models assume that the labeled data are all clean without noise or outliers, and there are few works focusing on the robustness of the cross-domain transfer learning ability to textual adversarial attacks in Few-shot NER. In this work, we comprehensively explore and assess the robustness of few-shot NER models under textual adversarial attack scenario, and found the vulnerability of existing few-shot NER models. Furthermore, we propose a robust two-stage few-shot NER method with Boundary Discrimination and Correlation Purification (BDCP). Specifically, in the span detection stage, the entity boundary discriminative module is introduced to provide a highly distinguishing boundary representation space to detect entity spans. In the entity typing stage, the correlations between entities and contexts are purified by minimizing the interference information and facilitating correlation generalization to alleviate the perturbations caused by textual adversarial attacks. In addition, we construct adversarial examples for few-shot NER based on public datasets Few-NERD and Cross-Dataset. Comprehensive evaluations on those two groups of few-shot NER datasets containing adversarial examples demonstrate the robustness and superiority of the proposed method.
翻译:小样本命名实体识别(Few-shot NER)旨在利用已有知识在低资源领域中识别新型命名实体。然而,当前小样本NER模型均假设标注数据纯净无噪声或异常值,且鲜有工作关注跨领域迁移学习能力对文本对抗攻击的鲁棒性。本文全面探索并评估了文本对抗攻击场景下小样本NER模型的鲁棒性,发现了现有模型的脆弱性。为此,我们提出一种鲁棒的两阶段小样本NER方法——边界判别与相关性净化(BDCP)。具体而言,在跨度检测阶段,引入实体边界判别模块,通过构建高区分度的边界表征空间来检测实体跨度;在实体分类阶段,通过最小化干扰信息并促进相关性泛化来净化实体与上下文之间的相关性,以缓解文本对抗攻击引起的扰动。此外,我们基于公开数据集Few-NERD与Cross-Dataset构建了小样本NER的对抗样本。在两组包含对抗样本的小样本NER数据集上的全面评估表明,所提方法具有鲁棒性与优越性。