Named Entity Recognition is the task to locate and classify the entities in the text. However, Unlabeled Entity Problem in NER datasets seriously hinders the improvement of NER performance. This paper proposes SCL-RAI to cope with this problem. Firstly, we decrease the distance of span representations with the same label while increasing it for different ones via span-based contrastive learning, which relieves the ambiguity among entities and improves the robustness of the model over unlabeled entities. Then we propose retrieval augmented inference to mitigate the decision boundary shifting problem. Our method significantly outperforms the previous SOTA method by 4.21% and 8.64% F1-score on two real-world datasets.
翻译:命名实体识别是从文本中定位并分类实体的任务。然而,NER数据集中存在的未标注实体问题严重阻碍了NER性能的提升。本文提出SCL-RAI以应对该问题。首先,我们通过基于跨度对比学习,缩小相同标签的跨度表示距离同时增大不同标签的表示距离,从而缓解实体间的歧义性并提升模型对未标注实体的鲁棒性。其次,我们提出检索增强推理以减少决策边界偏移问题。在两个真实数据集上,本方法相较先前最先进方法分别取得了4.21%和8.64%的F1分数显著提升。