Much of named entity recognition (NER) research focuses on developing dataset-specific models based on data from the domain of interest, and a limited set of related entity types. This is frustrating as each new dataset requires a new model to be trained and stored. In this work, we present a ``versatile'' model -- the Prompting-based Unified NER system (PUnifiedNER) -- that works with data from different domains and can recognise up to 37 entity types simultaneously, and theoretically it could be as many as possible. By using prompt learning, PUnifiedNER is a novel approach that is able to jointly train across multiple corpora, implementing intelligent on-demand entity recognition. Experimental results show that PUnifiedNER leads to significant prediction benefits compared to dataset-specific models with impressively reduced model deployment costs. Furthermore, the performance of PUnifiedNER can achieve competitive or even better performance than state-of-the-art domain-specific methods for some datasets. We also perform comprehensive pilot and ablation studies to support in-depth analysis of each component in PUnifiedNER.
翻译:命名实体识别(NER)研究大多侧重于基于目标领域数据开发针对特定数据集的模型,并仅识别有限的相关实体类型。这导致了一个问题——每当处理新数据集时,都需要训练并存储新模型。本文提出了一种“多功能”模型——基于提示的统一NER系统(PUnifiedNER),它能处理来自不同领域的数据,同时识别多达37种实体类型,理论上可支持更多类型。通过利用提示学习,PUnifiedNER 是一种能够跨多个语料库联合训练的新方法,实现了智能按需实体识别。实验结果表明,与针对特定数据集的模型相比,PUnifiedNER 在预测性能上具有显著优势,同时大幅降低了模型部署成本。此外,对于某些数据集,PUnifiedNER 的性能可达到甚至超越最先进的领域特定方法。我们还进行了全面的试点研究和消融实验,以支持对 PUnifiedNER 各组成部分的深入分析。