This paper describes the system developed by the USTC-NELSLIP team for SemEval-2023 Task 2 Multilingual Complex Named Entity Recognition (MultiCoNER II). A method named Statistical Construction and Dual Adaptation of Gazetteer (SCDAG) is proposed for Multilingual Complex NER. The method first utilizes a statistics-based approach to construct a gazetteer. Secondly, the representations of gazetteer networks and language models are adapted by minimizing the KL divergence between them at both the sentence-level and entity-level. Finally, these two networks are then integrated for supervised named entity recognition (NER) training. The proposed method is applied to XLM-R with a gazetteer built from Wikidata, and shows great generalization ability across different tracks. Experimental results and detailed analysis verify the effectiveness of the proposed method. The official results show that our system ranked 1st on one track (Hindi) in this task.
翻译:本文描述了中科大NELSLIP团队为SemEval-2023任务2(多语言复杂命名实体识别,MultiCoNER II)开发的系统。我们提出了一种名为"统计构建与双自适应地名词典"(SCDAG)的方法,用于多语言复杂NER。该方法首先利用基于统计的方法构建地名词典。其次,通过最小化地名词典网络与语言模型在句子级和实体级上的KL散度,对两者的表征进行自适应调整。最后,将这两个网络整合用于监督式命名实体识别(NER)训练。我们将该方法应用于基于Wikidata构建地名词典的XLM-R模型,在不同赛道中展现出强大的泛化能力。实验结果与详细分析验证了该方法的有效性。官方结果显示,我们的系统在该任务的一个赛道(印地语)上排名第一。