DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System for Multilingual Named Entity Recognition

The MultiCoNER \RNum{2} shared task aims to tackle multilingual named entity recognition (NER) in fine-grained and noisy scenarios, and it inherits the semantic ambiguity and low-context setting of the MultiCoNER \RNum{1} task. To cope with these problems, the previous top systems in the MultiCoNER \RNum{1} either incorporate the knowledge bases or gazetteers. However, they still suffer from insufficient knowledge, limited context length, single retrieval strategy. In this paper, our team \textbf{DAMO-NLP} proposes a unified retrieval-augmented system (U-RaNER) for fine-grained multilingual NER. We perform error analysis on the previous top systems and reveal that their performance bottleneck lies in insufficient knowledge. Also, we discover that the limited context length causes the retrieval knowledge to be invisible to the model. To enhance the retrieval context, we incorporate the entity-centric Wikidata knowledge base, while utilizing the infusion approach to broaden the contextual scope of the model. Also, we explore various search strategies and refine the quality of retrieval knowledge. Our system\footnote{We will release the dataset, code, and scripts of our system at {\small \url{https://github.com/modelscope/AdaSeq/tree/master/examples/U-RaNER}}.} wins 9 out of 13 tracks in the MultiCoNER \RNum{2} shared task. Additionally, we compared our system with ChatGPT, one of the large language models which have unlocked strong capabilities on many tasks. The results show that there is still much room for improvement for ChatGPT on the extraction task.

翻译：MultiCoNER \RNum{2}共享任务旨在解决细粒度且含噪声场景下的多语言命名实体识别（NER）问题，并继承了MultiCoNER \RNum{1}任务的语义歧义性和低上下文设置。为应对这些问题，此前MultiCoNER \RNum{1}任务中的顶级系统要么引入了知识库，要么引入了地名词典。然而，它们仍受制于知识不充分、上下文长度有限以及检索策略单一等问题。本文中，我们的团队\textbf{DAMO-NLP}提出了一个面向细粒度多语言NER的统一检索增强系统（U-RaNER）。我们对先前顶级系统进行了错误分析，揭示了其性能瓶颈在于知识不充分。同时，我们发现有限的上下文长度导致模型无法感知检索到的知识。为增强检索上下文，我们引入了以实体为中心的维基数据知识库，并利用注入方法拓展模型的上下文范围。此外，我们探索了多种搜索策略，并优化了检索知识的质量。我们的系统\footnote{我们将在 {\small \url{https://github.com/modelscope/AdaSeq/tree/master/examples/U-RaNER}} 发布系统数据集、代码和脚本。}在MultiCoNER \RNum{2}共享任务的13个赛道中赢得了9个。此外，我们将系统与ChatGPT（一种已在多项任务中展现出强大能力的大语言模型）进行了对比。结果表明，ChatGPT在抽取任务上仍有很大的提升空间。