Self-supervised knowledge-graph completion (KGC) relies on estimating a scoring model over (entity, relation, entity)-tuples, for example, by embedding an initial knowledge graph. Prediction quality can be improved by calibrating the scoring model, typically by adjusting the prediction thresholds using manually annotated examples. In this paper, we attempt for the first time cold-start calibration for KGC, where no annotated examples exist initially for calibration, and only a limited number of tuples can be selected for annotation. Our new method ACTC finds good per-relation thresholds efficiently based on a limited set of annotated tuples. Additionally to a few annotated tuples, ACTC also leverages unlabeled tuples by estimating their correctness with Logistic Regression or Gaussian Process classifiers. We also experiment with different methods for selecting candidate tuples for annotation: density-based and random selection. Experiments with five scoring models and an oracle annotator show an improvement of 7% points when using ACTC in the challenging setting with an annotation budget of only 10 tuples, and an average improvement of 4% points over different budgets.
翻译:自监督知识图谱补全(KGC)依赖于对(实体,关系,实体)三元组进行评分模型估计,例如通过嵌入初始知识图谱。通过使用人工标注样本调整预测阈值来校准评分模型,可提升预测质量。本文首次尝试对KGC进行冷启动校准,即初始阶段无标注样本可供校准,且仅能选择有限数量的三元组进行标注。新方法ACT通过高效利用有限标注三元组寻找每个关系的良好阈值。除少量标注三元组外,ACT还通过逻辑回归或高斯过程分类器估计未标注三元组的正确性,从而利用无标签数据。我们还实验了两种候选三元组选择方法:基于密度和随机选择。五种评分模型及理想标注器的实验表明,在仅有10个三元组的标注预算这一具有挑战性的设置下,使用ACT实现了7%的性能提升,不同预算下的平均提升为4%。