Sequential labeling is a task predicting labels for each token in a sequence, such as Named Entity Recognition (NER). NER tasks aim to extract entities and predict their labels given a text, which is important in information extraction. Although previous works have shown great progress in improving NER performance, uncertainty estimation on NER (UE-NER) is still underexplored but essential. This work focuses on UE-NER, which aims to estimate uncertainty scores for the NER predictions. Previous uncertainty estimation models often overlook two unique characteristics of NER: the connection between entities (i.e., one entity embedding is learned based on the other ones) and wrong span cases in the entity extraction subtask. Therefore, we propose a Sequential Labeling Posterior Network (SLPN) to estimate uncertainty scores for the extracted entities, considering uncertainty transmitted from other tokens. Moreover, we have defined an evaluation strategy to address the specificity of wrong-span cases. Our SLPN has achieved significant improvements on two datasets, such as a 5.54-point improvement in AUPR on the MIT-Restaurant dataset.
翻译:序列标注是一项为序列中每个标记预测标签的任务,例如命名实体识别(NER)。NER任务旨在根据给定文本提取实体并预测其标签,这在信息抽取中至关重要。尽管先前的研究在提升NER性能方面取得了显著进展,但关于NER的不确定性估计(UE-NER)仍未被充分探索,而这正是当前工作关注的重点。UE-NER旨在为NER预测结果估计不确定性分数。以往的不确定性估计模型通常忽略了NER的两个独特特性:实体之间的关联性(即一个实体嵌入的构建依赖于其他实体)以及实体提取子任务中的错误跨度情况。为此,我们提出了一种序列标注后验网络(SLPN),用于为提取的实体估计不确定性分数,同时考虑来自其他标记的不确定性传递。此外,我们定义了一种评估策略以处理错误跨度情况的特殊性。我们的SLPN在两个数据集上取得了显著改进,例如在MIT-Restaurant数据集上的AUPR指标提升了5.54个百分点。