We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 18 datasets annotated with named entities in a cross-lingual consistent schema across 12 diverse languages. In this paper, we detail the dataset creation and composition of UNER; we also provide initial modeling baselines on both in-language and cross-lingual learning settings. We release the data, code, and fitted models to the public.
翻译:我们介绍了通用NER(UNER),这是一个开放、社区驱动的项目,旨在为多种语言开发黄金标准的NER基准。UNER的总体目标是提供高质量、跨语言一致的标注,以促进和标准化多语言NER研究。UNER v1包含18个数据集,涵盖12种不同的语言,这些数据集按照跨语言一致的标注方案标注了命名实体。在本文中,我们详细介绍了UNER的数据集创建和构成;同时,我们提供了在语内和跨语言学习设置下的初步建模基线。我们将数据、代码和训练好的模型向公众发布。