RPO-RAG: Aligning Small LLMs with Relation-aware Preference Optimization for Knowledge Graph Question Answering

Large Language Models (LLMs) have recently demonstrated remarkable reasoning abilities, yet hallucinate on knowledge-intensive tasks. Retrieval-augmented generation (RAG) mitigates this issue by grounding answers in external sources, e.g., knowledge graphs (KGs). However, existing KG-based RAG approaches rely on semantics-unaware path sampling and are weakly aligned with KG reasoning objectives, which limits further accuracy gains. They also feed retrieved paths directly into the reasoner without organizing them into answer-centered reasoning paths, hindering small LLMs' ability to leverage the retrieved knowledge. Furthermore, prior works predominantly rely on large LLMs (e.g., ChatGPT/GPT-4) or assume backbones above 7B parameters, leaving sub-7B models underexplored. We address this gap with RPO-RAG, the first KG-based RAG framework specifically designed for small LLMs, to the best of our knowledge. RPO-RAG introduces three key innovations: (1) a query-path semantic sampling strategy that provides informative supervisory signals; (2) a relation-aware preference optimization that aligns training with intermediate KG reasoning signals (e.g., relation); and (3) an answer-centered prompt design that organizes entities and reasoning paths in an interpretable format. Extensive experiments on two benchmark Knowledge Graph Question Answering (KGQA) datasets, WebQSP and CWQ, demonstrate that RPO-RAG effectively bridges the performance gap between small and large language models. On WebQSP, it improves F1 by up to 8.8%, reflecting enhanced answer precision, while on CWQ it achieves new state-of-the-art results among models under 8B parameters in both Hit and F1. Overall, RPO-RAG substantially improves the reasoning capability of small LLMs, even under 3B parameters-highlighting their potential for resource-efficient and practical on-device KGQA applications.

翻译：大型语言模型（LLM）近期展现出卓越的推理能力，但在知识密集型任务上仍存在幻觉问题。检索增强生成（RAG）通过将答案锚定于外部知识源（例如知识图谱（KG））来缓解此问题。然而，现有的基于KG的RAG方法依赖于语义无关的路径采样，且与KG推理目标的弱对齐限制了其准确性的进一步提升。这些方法还将检索到的路径直接输入推理器，而未将其组织成以答案为中心的推理路径，从而阻碍了小型LLM利用检索知识的能力。此外，先前工作主要依赖大型LLM（如ChatGPT/GPT-4）或假设骨干网络参数量超过7B，导致参数量低于7B的模型未被充分探索。据我们所知，我们通过RPO-RAG填补了这一空白，这是首个专为小型LLM设计的基于KG的RAG框架。RPO-RAG引入了三项关键创新：（1）查询-路径语义采样策略，提供信息丰富的监督信号；（2）关系感知偏好优化，使训练与中间KG推理信号（如关系）对齐；（3）以答案为中心的提示设计，以可解释的格式组织实体和推理路径。在两个基准知识图谱问答（KGQA）数据集WebQSP和CWQ上进行的大量实验表明，RPO-RAG有效缩小了小型与大型语言模型之间的性能差距。在WebQSP上，其F1分数最高提升8.8%，体现了答案精度的增强；在CWQ上，其在参数量低于8B的模型中，于Hit和F1指标上均取得了新的最先进结果。总体而言，RPO-RAG显著提升了小型LLM（即使在3B参数量下）的推理能力，凸显了其在资源高效且实用的设备端KGQA应用中的潜力。