Although named entity recognition (NER) helps us to extract domain-specific entities from text (e.g., artists in the music domain), it is costly to create a large amount of training data or a structured knowledge base to perform accurate NER in the target domain. Here, we propose self-adaptive NER, which retrieves external knowledge from unstructured text to learn the usages of entities that have not been learned well. To retrieve useful knowledge for NER, we design an effective two-stage model that retrieves unstructured knowledge using uncertain entities as queries. Our model predicts the entities in the input and then finds those of which the prediction is not confident. Then, it retrieves knowledge by using these uncertain entities as queries and concatenates the retrieved text to the original input to revise the prediction. Experiments on CrossNER datasets demonstrated that our model outperforms strong baselines by 2.35 points in F1 metric.
翻译:虽然命名实体识别(NER)有助于从文本中提取领域特定实体(如音乐领域的艺术家),但在目标领域构建精准NER所需的大规模训练数据或结构化知识库成本高昂。本文提出自适应NER方法,通过从非结构化文本中检索外部知识,学习尚未充分掌握的实体用法。为有效检索NER所需知识,我们设计了两阶段模型:以不确定实体作为查询条件,从非结构化知识中检索信息。该模型首先预测输入文本中的实体,继而识别预测置信度较低的实体,随后以这些不确定实体为查询条件检索相关知识,并将检索文本拼接至原始输入以修正预测结果。在CrossNER数据集上的实验表明,本模型在F1指标上以2.35个百分点的优势超越强基线方法。