Large Language Models (LLMs) present massive inherent knowledge and superior semantic comprehension capability, which have revolutionized various tasks in natural language processing. Despite their success, a critical gap remains in enabling LLMs to perform knowledge graph completion (KGC). Empirical evidence suggests that LLMs consistently perform worse than conventional KGC approaches, even through sophisticated prompt design or tailored instruction-tuning. Fundamentally, applying LLMs on KGC introduces several critical challenges, including a vast set of entity candidates, hallucination issue of LLMs, and under-exploitation of the graph structure. To address these challenges, we propose a novel instruction-tuning-based method, namely FtG. Specifically, we present a \textit{filter-then-generate} paradigm and formulate the KGC task into a multiple-choice question format. In this way, we can harness the capability of LLMs while mitigating the issue casused by hallucinations. Moreover, we devise a flexible ego-graph serialization prompt and employ a structure-text adapter to couple structure and text information in a contextualized manner. Experimental results demonstrate that FtG achieves substantial performance gain compared to existing state-of-the-art methods. The instruction dataset and code are available at \url{https://github.com/LB0828/FtG}.
翻译:大语言模型(LLMs)具备海量内在知识与卓越的语义理解能力,已彻底变革自然语言处理领域的多项任务。尽管取得显著成功,但如何使LLMs有效执行知识图谱补全(KGC)任务仍存在关键空白。实证研究表明,即使通过精妙的提示设计或定制化的指令微调,LLMs在KGC任务上的表现仍持续逊于传统KGC方法。从本质上看,将LLMs应用于KGC面临若干关键挑战,包括庞大的实体候选集、LLMs的幻觉问题以及图结构信息利用不足。为应对这些挑战,我们提出一种新颖的基于指令微调的方法——FtG。具体而言,我们设计了一种“筛选后生成”范式,并将KGC任务转化为多项选择题形式。通过这种方式,我们能够在利用LLMs能力的同时缓解幻觉问题。此外,我们设计了一种灵活的自我图序列化提示,并采用结构-文本适配器以情境化方式耦合结构与文本信息。实验结果表明,与现有最先进方法相比,FtG实现了显著的性能提升。指令数据集与代码已公开于 \url{https://github.com/LB0828/FtG}。