Jointly extracting entity pairs and their relations is challenging when working on distantly-supervised data with ambiguous or noisy labels. To mitigate such impact, we propose uncertainty-aware bootstrap learning, which is motivated by the intuition that the higher uncertainty of an instance, the more likely the model confidence is inconsistent with the ground truths. Specifically, we first explore instance-level data uncertainty to create an initial high-confident examples. Such subset serves as filtering noisy instances and facilitating the model to converge fast at the early stage. During bootstrap learning, we propose self-ensembling as a regularizer to alleviate inter-model uncertainty produced by noisy labels. We further define probability variance of joint tagging probabilities to estimate inner-model parametric uncertainty, which is used to select and build up new reliable training instances for the next iteration. Experimental results on two large datasets reveal that our approach outperforms existing strong baselines and related methods.
翻译:在远程监督数据上联合抽取实体对及其关系面临着标签模糊或噪声的挑战。为缓解此类影响,我们提出了不确定性感知的自举学习方法,其动机源于直觉:实例的不确定性越高,模型置信度与真实标签不一致的可能性就越大。具体而言,我们首先探索实例级数据不确定性以构建初始高置信度样本集,该子集既能过滤噪声实例,又能促进模型在早期快速收敛。在自举学习过程中,我们引入自集成机制作为正则化项,以缓解噪声标签引发的模型间不确定性。此外,我们定义联合标注概率的方差来估计模型内参数不确定性,从而为下一轮迭代筛选并扩充可靠的新训练实例。在两个大型数据集上的实验结果表明,我们的方法优于现有强基线方法及相关技术。