Traditional knowledge graph (KG) completion models learn embeddings to predict missing facts. Recent works attempt to complete KGs in a text-generation manner with large language models (LLMs). However, they need to ground the output of LLMs to KG entities, which inevitably brings errors. In this paper, we present a finetuning framework, DIFT, aiming to unleash the KG completion ability of LLMs and avoid grounding errors. Given an incomplete fact, DIFT employs a lightweight model to obtain candidate entities and finetunes an LLM with discrimination instructions to select the correct one from the given candidates. To improve performance while reducing instruction data, DIFT uses a truncated sampling method to select useful facts for finetuning and injects KG embeddings into the LLM. Extensive experiments on benchmark datasets demonstrate the effectiveness of our proposed framework.
翻译:传统的知识图谱(KG)补全模型通过学习嵌入来预测缺失事实。近期研究尝试利用大语言模型(LLM)以文本生成方式完成知识图谱补全。然而,这些方法需要将LLM的输出与知识图谱实体对齐,这不可避免地会引入误差。本文提出一种微调框架DIFT,旨在释放LLM的知识图谱补全能力并避免对齐误差。给定一个不完整事实,DIFT采用轻量级模型获取候选实体,并通过判别指令微调LLM以从给定候选中选择正确实体。为在减少指令数据的同时提升性能,DIFT采用截断采样方法选择对微调有用的事实,并将知识图谱嵌入注入LLM。在基准数据集上的大量实验证明了我们提出框架的有效性。