Link Prediction(LP) is an essential task over Knowledge Graphs(KGs), traditionally focussed on using and predicting the relations between entities. Textual entity descriptions have already been shown to be valuable, but models that incorporate numerical literals have shown minor improvements on existing benchmark datasets. It is unclear whether a model is actually better in using numerical literals, or better capable of utilizing the graph structure. This raises doubts about the effectiveness of these methods and about the suitability of the existing benchmark datasets. We propose a methodology to evaluate LP models that incorporate numerical literals. We propose i) a new synthetic dataset to better understand how well these models use numerical literals and ii) dataset ablations strategies to investigate potential difficulties with the existing datasets. We identify a prevalent trend: many models underutilize literal information and potentially rely on additional parameters for performance gains. Our investigation highlights the need for more extensive evaluations when releasing new models and datasets.
翻译:链接预测(Link Prediction, LP)是知识图谱(Knowledge Graphs, KGs)上的一项核心任务,传统上侧重于利用和预测实体间的关系。文本实体描述已被证明具有重要价值,但融合数值字面量的模型在现有基准数据集上仅表现出有限的性能提升。目前尚不清楚一个模型究竟是真正更善于利用数值字面量,还是更擅长利用图结构。这引发了对这些方法有效性以及现有基准数据集适用性的质疑。我们提出了一种评估融合数值字面量的链接预测模型的方法论。具体包括:i) 构建一个新的合成数据集以更深入理解这些模型利用数值字面量的能力;ii) 采用数据集消融策略以探究现有数据集可能存在的难点。我们发现一个普遍趋势:许多模型未能充分利用字面量信息,其性能提升可能依赖于额外的参数。我们的研究强调,在发布新模型和数据集时,需要进行更广泛的评估。