Named Entity Recognition (NER) is a core natural language processing task in which pre-trained language models have shown remarkable performance. However, standard benchmarks like CoNLL 2003 do not address many of the challenges that deployed NER systems face, such as having to classify emerging or complex entities in a fine-grained way. In this paper we present a novel NER cascade approach comprising three steps: first, identifying candidate entities in the input sentence; second, linking the each candidate to an existing knowledge base; third, predicting the fine-grained category for each entity candidate. We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities. Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting where we leverage knowledge bases of high-resource languages.
翻译:命名实体识别(NER)是自然语言处理的核心任务,预训练语言模型在此任务中展现出卓越性能。然而,像CoNLL 2003这样的标准基准测试并未解决实际部署的NER系统面临的诸多挑战,例如需要以细粒度的方式对新出现或复杂实体进行分类。本文提出了一种新颖的NER级联方法,包含三个步骤:首先,识别输入句子中的候选实体;其次,将每个候选实体链接到现有知识库;最后,为每个候选实体预测细粒度类别。我们通过实验证明了外部知识库在准确分类细粒度实体和新出现实体方面的重要性。我们的系统在MultiCoNER2共享任务中表现出稳健的性能,即使在利用高资源语言知识库的低资源语言场景中也是如此。