Named Entity Recognition (NER) is a core natural language processing task in which pre-trained language models have shown remarkable performance. However, standard benchmarks like CoNLL 2003 \cite{conll03} do not address many of the challenges that deployed NER systems face, such as having to classify emerging or complex entities in a fine-grained way. In this paper we present a novel NER cascade approach comprising three steps: first, identifying candidate entities in the input sentence; second, linking the each candidate to an existing knowledge base; third, predicting the fine-grained category for each entity candidate. We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities. Our system exhibits robust performance in the MultiCoNER2 \cite{multiconer2-data} shared task, even in the low-resource language setting where we leverage knowledge bases of high-resource languages.
翻译:命名实体识别(NER)是自然语言处理的核心任务,预训练语言模型在该任务中展现出卓越性能。然而,诸如CoNLL 2003 \cite{conll03}等标准基准测试并未涵盖部署型NER系统面临的多项挑战,例如需对新兴或复杂实体进行细粒度分类。本文提出一种新颖的NER级联方法,包含三个步骤:首先,在输入句子中识别候选实体;其次,将每个候选实体链接至现有知识库;最后,为每个候选实体预测细粒度类别。我们通过实证证明了外部知识库在准确分类细粒度及新兴实体中的关键作用。本系统在MultiCoNER2 \cite{multiconer2-data}共享任务中展现出稳健性能,即便在利用高资源语言知识库的低资源语言场景下仍表现优异。