Sequential labeling is a task predicting labels for each token in a sequence, such as Named Entity Recognition (NER). NER tasks aim to extract entities and predict their labels given a text, which is important in information extraction. Although previous works have shown great progress in improving NER performance, uncertainty estimation on NER (UE-NER) is still underexplored but essential. This work focuses on UE-NER, which aims to estimate uncertainty scores for the NER predictions. Previous uncertainty estimation models often overlook two unique characteristics of NER: the connection between entities (i.e., one entity embedding is learned based on the other ones) and wrong span cases in the entity extraction subtask. Therefore, we propose a Sequential Labeling Posterior Network (SLPN) to estimate uncertainty scores for the extracted entities, considering uncertainty transmitted from other tokens. Moreover, we have defined an evaluation strategy to address the specificity of wrong-span cases. Our SLPN has achieved significant improvements on three datasets, such as a 5.54-point improvement in AUPR on the MIT-Restaurant dataset. Our code is available at \url{https://github.com/he159ok/UncSeqLabeling_SLPN}.
翻译:序列标注是一项为序列中每个标记预测标签的任务,例如命名实体识别(NER)。NER任务旨在从给定文本中抽取实体并预测其标签,这在信息抽取中至关重要。尽管已有研究在提升NER性能方面取得了显著进展,但针对NER的不确定性估计(UE-NER)仍是一个未被充分探索但至关重要的领域。本文聚焦于UE-NER,旨在为NER预测结果估计不确定性分数。现有不确定性估计模型往往忽略NER的两个独特特性:实体间的关联性(即一个实体嵌入基于其他实体学习得到)以及实体抽取子任务中的错误跨度案例。为此,我们提出序列标注后验网络(SLPN),通过考虑从其他标记传递的不确定性,为抽取的实体估计不确定性分数。此外,我们定义了一种评估策略以专门处理错误跨度案例。实验结果表明,我们的SLPN在三个数据集上取得了显著提升,例如在MIT-Restaurant数据集上AUPR指标提升了5.54个百分点。我们的代码开源在 \url{https://github.com/he159ok/UncSeqLabeling_SLPN}。