DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition

The advancements of Large Language Models (LLMs) have spurred a growing interest in their application to Named Entity Recognition (NER) methods. However, existing datasets are primarily designed for traditional machine learning methods and are inadequate for LLM-based methods, in terms of corpus selection and overall dataset design logic. Moreover, the prevalent fixed and relatively coarse-grained entity categorization in existing datasets fails to adequately assess the superior generalization and contextual understanding capabilities of LLM-based methods, thereby hindering a comprehensive demonstration of their broad application prospects. To address these limitations, we propose DynamicNER, the first NER dataset designed for LLM-based methods with dynamic categorization, introducing various entity types and entity type lists for the same entity in different context, leveraging the generalization of LLM-based NER better. The dataset is also multilingual and multi-granular, covering 8 languages and 155 entity types, with corpora spanning a diverse range of domains. Furthermore, we introduce CascadeNER, a novel NER method based on a two-stage strategy and lightweight LLMs, achieving higher accuracy on fine-grained tasks while requiring fewer computational resources. Experiments show that DynamicNER serves as a robust and effective benchmark for LLM-based NER methods. Furthermore, we also conduct analysis for traditional methods and LLM-based methods on our dataset. Our code and dataset are openly available at https://github.com/Astarojth/DynamicNER.

翻译：大型语言模型（LLM）的进展激发了人们将其应用于命名实体识别（NER）方法的日益增长的兴趣。然而，现有数据集主要针对传统机器学习方法设计，在语料选择和整体数据集设计逻辑方面，对于基于LLM的方法而言存在不足。此外，现有数据集中普遍存在的固定且相对粗粒度的实体分类，未能充分评估基于LLM的方法在泛化能力和上下文理解方面的优越性，从而阻碍了全面展示其广泛的应用前景。为应对这些局限性，我们提出了DynamicNER，这是首个专为基于LLM的方法设计的、具有动态分类的NER数据集，通过为同一实体在不同上下文中引入多种实体类型和实体类型列表，更好地利用了基于LLM的NER的泛化能力。该数据集同时具备多语言和多粒度特性，涵盖8种语言和155种实体类型，语料覆盖多个不同领域。此外，我们提出了CascadeNER，一种基于两阶段策略和轻量级LLM的新型NER方法，在细粒度任务上实现了更高的准确率，同时所需计算资源更少。实验表明，DynamicNER为基于LLM的NER方法提供了一个稳健且有效的基准。此外，我们还在我们的数据集上对传统方法和基于LLM的方法进行了分析。我们的代码和数据集已在 https://github.com/Astarojth/DynamicNER 公开提供。

相关内容

实体

关注 12

实体（entity）是有可区别性且独立存在的某种事物，但它不需要是物质上的存在。尤其是抽象和法律拟制也通常被视为实体。实体可被看成是一包含有子集的集合。在哲学里，这种集合被称为客体。实体可被使用来指涉某个可能是人、动物、植物或真菌等不会思考的生命、无生命物体或信念等的事物。在这一方面，实体可以被视为一全包的词语。有时，实体被当做本质的广义，不论即指的是否为物质上的存在，如时常会指涉到的无物质形式的实体－语言。更有甚者，实体有时亦指存在或本质本身。在法律上，实体是指能具有权利和义务的事物。这通常是指法人，但也包括自然人。

LLM/智能体作为数据分析师：综述

专知会员服务

38+阅读 · 2025年9月30日

大型语言模型（LLM）智能体全栈安全的综述：数据、训练与部署

专知会员服务

33+阅读 · 2025年4月23日

利用多个大型语言模型：关于LLM集成的调研

专知会员服务

35+阅读 · 2025年2月27日

带入您自己的知识：大型语言模型（LLM）知识扩展方法综述

专知会员服务

38+阅读 · 2025年2月21日