Large Language Models (LLMs) demonstrate impressive natural language capabilities but often struggle with knowledge-intensive reasoning tasks. Knowledge Base Question Answering (KBQA), which leverages structured Knowledge Graphs (KGs) exemplifies this challenge due to the need for accurate multi-hop reasoning. Existing approaches typically perform sequential reasoning steps guided by predefined pipelines, restricting flexibility and causing error cascades due to isolated reasoning at each step. To address these limitations, we propose KG-Hopper, a novel Reinforcement Learning (RL) framework that empowers compact open LLMs with the ability to perform integrated multi-hop KG reasoning within a single inference round. Rather than reasoning step-by-step, we train a Reasoning LLM that embeds the entire KG traversal and decision process into a unified ``thinking'' stage, enabling global reasoning over cross-step dependencies and dynamic path exploration with backtracking. Experimental results on eight KG reasoning benchmarks show that KG-Hopper, based on a 7B-parameter LLM, consistently outperforms larger multi-step systems (up to 70B) and achieves competitive performance with proprietary models such as GPT-3.5-Turbo and GPT-4o-mini, while remaining compact, open, and data-efficient. The code is publicly available at: https://github.com/Wangshuaiia/KG-Hopper.
翻译:大语言模型(LLMs)展现出惊人的自然语言能力,但在知识密集型推理任务中常遇挑战。知识库问答(KBQA)借助结构化知识图谱(KGs)实现推理,这一任务因需精确的多跳推理而尤为困难。现有方法通常遵循预定义流水线进行顺序推理,这限制了灵活性,且因各步孤立推理导致错误级联。为解决这些局限,我们提出KG-Hopper——一种新颖的强化学习(RL)框架,使紧凑型开源LLMs具备在单次推理回合内完成集成式多跳KG推理的能力。我们不采用逐步推理方式,而是训练一个推理LLM,将整个KG遍历与决策过程嵌入统一的“思考”阶段,从而实现对跨步依赖性的全局推理,并支持带回溯的动态路径探索。在八个KG推理基准上的实验结果表明:基于7B参数LLM的KG-Hopper,持续优于更大规模的多步系统(最高达70B),并与GPT-3.5-Turbo和GPT-4o-mini等专有模型性能相当,同时保持了紧凑、开源与数据高效的特性。代码已公开于:https://github.com/Wangshuaiia/KG-Hopper。