Large Language Models (LLMs) demonstrate impressive natural language capabilities but often struggle with knowledge-intensive reasoning tasks. Knowledge Base Question Answering (KBQA), which leverages structured Knowledge Graphs (KGs) exemplifies this challenge due to the need for accurate multi-hop reasoning. Existing approaches typically perform sequential reasoning steps guided by predefined pipelines, restricting flexibility and causing error cascades due to isolated reasoning at each step. To address these limitations, we propose KG-Hopper, a novel Reinforcement Learning (RL) framework that empowers compact open LLMs with the ability to perform integrated multi-hop KG reasoning within a single inference round. Rather than reasoning step-by-step, we train a Reasoning LLM that embeds the entire KG traversal and decision process into a unified ``thinking'' stage, enabling global reasoning over cross-step dependencies and dynamic path exploration with backtracking. Experimental results on eight KG reasoning benchmarks show that KG-Hopper, based on a 7B-parameter LLM, consistently outperforms larger multi-step systems (up to 70B) and achieves competitive performance with proprietary models such as GPT-3.5-Turbo and GPT-4o-mini, while remaining compact, open, and data-efficient. The code is publicly available at: https://github.com/Wangshuaiia/KG-Hopper.
翻译:大语言模型展现出令人印象深刻的自然语言能力,但在知识密集型推理任务中常显不足。利用结构化知识图谱的知识库问答(KBQA)正是这一挑战的典型体现,因其需要对知识图谱进行准确的多跳推理。现有方法通常遵循预定义流程执行顺序推理步骤,这限制了灵活性,且因各步骤的孤立推理而导致错误级联。为解决这些局限,我们提出KG-Hopper,一种新颖的强化学习框架,赋予紧凑型开放大语言模型在单次推理回合内执行集成式多跳知识图谱推理的能力。我们并非进行逐步推理,而是训练一个推理大语言模型,将整个知识图谱遍历与决策过程嵌入统一的“思考”阶段,从而实现对跨步骤依赖的全局推理,并支持动态路径探索与回溯。在八个知识图谱推理基准上的实验结果表明,基于70亿参数大语言模型的KG-Hopper始终优于更大的多步系统(参数规模可达700亿),并在保持紧凑、开放与数据高效特性的同时,取得了与GPT-3.5-Turbo和GPT-4o-mini等专有模型相媲美的性能。代码已开源:https://github.com/Wangshuaiia/KG-Hopper。