Knowledge Base Question Answering (KBQA) challenges models to bridge the gap between natural language and strict knowledge graph schemas by generating executable logical forms. While Large Language Models (LLMs) have advanced this field, current approaches often struggle with a dichotomy of failure: they either generate hallucinated queries without verifying schema existence or exhibit rigid, template-based reasoning that mimics synthesized traces without true comprehension of the environment. To address these limitations, we present \textbf{KBQA-R1}, a framework that shifts the paradigm from text imitation to interaction optimization via Reinforcement Learning. Treating KBQA as a multi-turn decision process, our model learns to navigate the knowledge base using a list of actions, leveraging Group Relative Policy Optimization (GRPO) to refine its strategies based on concrete execution feedback rather than static supervision. Furthermore, we introduce \textbf{Referenced Rejection Sampling (RRS)}, a data synthesis method that resolves cold-start challenges by strictly aligning reasoning traces with ground-truth action sequences. Extensive experiments on WebQSP, GrailQA, and GraphQuestions demonstrate that KBQA-R1 achieves state-of-the-art performance, effectively grounding LLM reasoning in verifiable execution.
翻译:知识库问答(KBQA)任务要求模型通过生成可执行的逻辑形式来弥合自然语言与严格知识图谱模式之间的差距。尽管大语言模型(LLM)推动了该领域的发展,但现有方法常面临双重困境:要么生成未验证模式存在性的幻觉查询,要么表现出僵化的模板式推理,仅模仿合成轨迹而未能真正理解环境。为突破这些局限,本文提出\textbf{KBQA-R1}框架,通过强化学习将范式从文本模仿转向交互优化。该框架将KBQA建模为多轮决策过程,使模型学习通过动作列表导航知识库,并利用组相对策略优化(GRPO)基于具体执行反馈(而非静态监督)优化策略。此外,我们提出\textbf{参考拒绝采样(RRS)}数据合成方法,通过严格对齐推理轨迹与真实动作序列解决冷启动问题。在WebQSP、GrailQA和GraphQuestions数据集上的大量实验表明,KBQA-R1实现了最先进的性能,有效将LLM推理锚定于可验证的执行过程中。