Real-world Knowledge Graphs (KGs) often suffer from incompleteness, which limits their potential performance. Knowledge Graph Completion (KGC) techniques aim to address this issue. However, traditional KGC methods are computationally intensive and impractical for large-scale KGs, necessitating the learning of dense node embeddings and computing pairwise distances. Generative transformer-based language models (e.g., T5 and recent KGT5) offer a promising solution as they can predict the tail nodes directly. In this study, we propose to include node neighborhoods as additional information to improve KGC methods based on language models. We examine the effects of this imputation and show that, on both inductive and transductive Wikidata subsets, our method outperforms KGT5 and conventional KGC approaches. We also provide an extensive analysis of the impact of neighborhood on model prediction and show its importance. Furthermore, we point the way to significantly improve KGC through more effective neighborhood selection.
翻译:现实世界中的知识图谱常因不完整性而限制其潜在性能。知识图谱补全技术旨在解决这一问题。然而,传统KGC方法计算成本高昂且难以适用于大规模知识图谱——需要学习密集节点嵌入并计算成对距离。基于Transformer的生成式语言模型(如T5及近期提出的KGT5)可通过直接预测尾节点提供有效解决方案。本研究提出将节点邻域作为附加信息,以改进基于语言模型的KGC方法。我们分析了这种信息补充的效果,结果表明:在归纳式和直推式维基数据子集上,我们的方法均优于KGT5及传统KGC方法。我们还深入分析了邻域对模型预测的影响,论证其重要性。此外,本研究为通过更有效的邻域选择显著提升KGC性能指明了方向。