RPO-RAG: Aligning Small LLMs with Relation-aware Preference Optimization for Knowledge Graph Question Answering

Large Language Models (LLMs) have recently demonstrated remarkable reasoning abilities, yet hallucinate on knowledge-intensive tasks. Retrieval-augmented generation (RAG) mitigates this issue by grounding answers in external sources, e.g., knowledge graphs (KGs). However, existing KG-based RAG approaches rely on semantics-unaware path sampling and are weakly aligned with KG reasoning objectives, which limits further accuracy gains. They also feed retrieved paths directly into the reasoner without organizing them into answer-centered reasoning paths, hindering small LLMs' ability to leverage the retrieved knowledge. Furthermore, prior works predominantly rely on large LLMs (e.g., ChatGPT/GPT-4) or assume backbones above 7B parameters, leaving sub-7B models underexplored. We address this gap with RPO-RAG, the first KG-based RAG framework specifically designed for small LLMs, to the best of our knowledge. RPO-RAG introduces three key innovations: (1) a query-path semantic sampling strategy that provides informative supervisory signals; (2) a relation-aware preference optimization that aligns training with intermediate KG reasoning signals (e.g., relation); and (3) an answer-centered prompt design that organizes entities and reasoning paths in an interpretable format. Extensive experiments on two benchmark Knowledge Graph Question Answering (KGQA) datasets, WebQSP and CWQ, demonstrate that RPO-RAG effectively bridges the performance gap between small and large language models. On WebQSP, it improves F1 by up to 8.8%, reflecting enhanced answer precision, while on CWQ it achieves new state-of-the-art results among models under 8B parameters in both Hit and F1. Overall, RPO-RAG substantially improves the reasoning capability of small LLMs, even under 3B parameters-highlighting their potential for resource-efficient and practical on-device KGQA applications.

翻译：大型语言模型（LLMs）近期展现出卓越的推理能力，但在知识密集型任务中仍存在幻觉问题。检索增强生成（RAG）通过将答案锚定于外部知识源（如知识图谱（KGs））来缓解此问题。然而，现有的基于知识图谱的RAG方法依赖于语义无关的路径采样，且与知识图谱推理目标的弱对齐限制了其准确性的进一步提升。这些方法还将检索到的路径直接输入推理器，而未将其组织成以答案为中心的推理路径，从而阻碍了小型语言模型利用检索知识的能力。此外，先前的研究主要依赖大型语言模型（如ChatGPT/GPT-4）或假设模型参数量超过70亿，导致参数量低于70亿的模型研究不足。据我们所知，我们通过RPO-RAG填补了这一空白，这是首个专门为小型语言模型设计的基于知识图谱的RAG框架。RPO-RAG引入了三项关键创新：（1）一种提供信息性监督信号的查询-路径语义采样策略；（2）一种关系感知的偏好优化方法，使训练与知识图谱推理中间信号（如关系）对齐；（3）一种以答案为中心的提示设计，以可解释的格式组织实体和推理路径。在两个基准知识图谱问答（KGQA）数据集WebQSP和CWQ上进行的大量实验表明，RPO-RAG有效缩小了小型与大型语言模型之间的性能差距。在WebQSP上，其F1分数最高提升8.8%，体现了答案精度的显著提高；在CWQ上，其在参数量低于80亿的模型中，于命中率和F1分数上均取得了新的最优结果。总体而言，RPO-RAG显著提升了小型语言模型（即使在参数量低于30亿的情况下）的推理能力，凸显了其在资源高效且实用的设备端知识图谱问答应用中的潜力。