Although large language models (LLMs) have achieved significant success in various tasks, they often struggle with hallucination problems, especially in scenarios requiring deep and responsible reasoning. These issues could be partially addressed by introducing external knowledge graphs (KG) in LLM reasoning. In this paper, we propose a new LLM-KG integrating paradigm ``$\hbox{LLM}\otimes\hbox{KG}$'' which treats the LLM as an agent to interactively explore related entities and relations on KGs and perform reasoning based on the retrieved knowledge. We further implement this paradigm by introducing a new approach called Think-on-Graph (ToG), in which the LLM agent iteratively executes beam search on KG, discovers the most promising reasoning paths, and returns the most likely reasoning results. We use a number of well-designed experiments to examine and illustrate the following advantages of ToG: 1) compared with LLMs, ToG has better deep reasoning power; 2) ToG has the ability of knowledge traceability and knowledge correctability by leveraging LLMs reasoning and expert feedback; 3) ToG provides a flexible plug-and-play framework for different LLMs, KGs and prompting strategies without any additional training cost; 4) the performance of ToG with small LLM models could exceed large LLM such as GPT-4 in certain scenarios and this reduces the cost of LLM deployment and application. As a training-free method with lower computational cost and better generality, ToG achieves overall SOTA in 6 out of 9 datasets where most previous SOTAs rely on additional training.
翻译:尽管大语言模型(LLMs)在各类任务中取得了显著成功,但在需要深层与负责任推理的场景下,它们常面临幻觉问题。通过在LLM推理中引入外部知识图谱(KG),可部分缓解这些问题。本文提出一种新的LLM-KG融合范式"$\hbox{LLM}\otimes\hbox{KG}$",将LLM作为智能体,交互式探索KG中的相关实体与关系,并基于检索到的知识进行推理。我们进一步通过引入名为Think-on-Graph(ToG)的新方法实现该范式:其中LLM智能体在KG上迭代执行束搜索,发现最可能的推理路径,并返回最可能的推理结果。通过一系列精心设计的实验,我们验证并阐明了ToG的以下优势:1)与LLMs相比,ToG具有更强的深层推理能力;2)ToG通过利用LLM推理与专家反馈,具备知识可追溯性与知识可修正性;3)ToG为不同LLMs、KGs及提示策略提供灵活的即插即用框架,且无需额外训练成本;4)在某些场景下,采用小型LLM模型的ToG性能可超越GPT-4等大型模型,从而降低LLM部署与应用成本。作为无需训练、计算成本更低且泛化性更强的方法,ToG在9个数据集中的6个上取得了总体最优(SOTA),而此前多数SOTA方法依赖额外训练。