Although large language models (LLMs) have achieved significant success in various tasks, they often struggle with hallucination problems, especially in scenarios requiring deep and responsible reasoning. These issues could be partially addressed by introducing external knowledge graphs (KG) in LLM reasoning. In this paper, we propose a new LLM-KG integrating paradigm ``$\hbox{LLM}\otimes\hbox{KG}$'' which treats the LLM as an agent to interactively explore related entities and relations on KGs and perform reasoning based on the retrieved knowledge. We further implement this paradigm by introducing a new approach called Think-on-Graph (ToG), in which the LLM agent iteratively executes beam search on KG, discovers the most promising reasoning paths, and returns the most likely reasoning results. We use a number of well-designed experiments to examine and illustrate the following advantages of ToG: 1) compared with LLMs, ToG has better deep reasoning power; 2) ToG has the ability of knowledge traceability and knowledge correctability by leveraging LLMs reasoning and expert feedback; 3) ToG provides a flexible plug-and-play framework for different LLMs, KGs and prompting strategies without any additional training cost; 4) the performance of ToG with small LLM models could exceed large LLM such as GPT-4 in certain scenarios and this reduces the cost of LLM deployment and application. As a training-free method with lower computational cost and better generality, ToG achieves overall SOTA in 6 out of 9 datasets where most previous SOTAs rely on additional training.
翻译:尽管大语言模型(LLM)在各种任务中取得了显著成功,但在需要深度和负责任推理的场景中,它们常受困于幻觉问题。这些挑战可通过在LLM推理中引入外部知识图谱(KG)得到部分缓解。本文提出一种新的LLM-KG融合范式“$\hbox{LLM}\otimes\hbox{KG}$”,将LLM视为智能体,使其在知识图谱上交互式探索相关实体与关系,并基于检索到的知识进行推理。我们进一步通过引入称为“思图而行(Think-on-Graph, ToG)”的新方法实现该范式。在该方法中,LLM智能体在知识图谱上迭代执行束搜索,发现最可能的推理路径,并返回最可能的推理结果。通过一系列精心设计的实验,我们验证并阐明了ToG的以下优势:1)相比LLM,ToG具有更强的深度推理能力;2)ToG通过结合LLM推理与专家反馈,具备知识可追溯性与知识可修正性;3)ToG为不同LLM、知识图谱和提示策略提供了灵活的即插即用框架,无需额外训练成本;4)在某些场景下,采用小型LLM模型的ToG性能可超越GPT-4等大型LLM,从而降低LLM部署与应用成本。作为一种无需训练、计算成本更低且通用性更强的方法,ToG在9个数据集中的6个上取得了整体最优性能(SOTA),而此前大多数SOTA方法均依赖额外训练。