Large Language Models (LLMs) have recently demonstrated remarkable reasoning abilities, yet hallucinate on knowledge-intensive tasks. Retrieval-augmented generation (RAG) mitigates this issue by grounding answers in external sources, e.g., knowledge graphs (KGs). However, existing KG-based RAG approaches rely on semantics-unaware path sampling and are weakly aligned with KG reasoning objectives, which limits further accuracy gains. They also feed retrieved paths directly into the reasoner without organizing them into answer-centered reasoning paths, hindering small LLMs' ability to leverage the retrieved knowledge. Furthermore, prior works predominantly rely on large LLMs (e.g., ChatGPT/GPT-4) or assume backbones above 7B parameters, leaving sub-7B models underexplored. We address this gap with RPO-RAG, the first KG-based RAG framework specifically designed for small LLMs, to the best of our knowledge. RPO-RAG introduces three key innovations: (1) a query-path semantic sampling strategy that provides informative supervisory signals; (2) a relation-aware preference optimization that aligns training with intermediate KG reasoning signals (e.g., relation); and (3) an answer-centered prompt design that organizes entities and reasoning paths in an interpretable format. Extensive experiments on two benchmark Knowledge Graph Question Answering (KGQA) datasets, WebQSP and CWQ, demonstrate that RPO-RAG effectively bridges the performance gap between small and large language models. On WebQSP, it improves F1 by up to 8.8%, reflecting enhanced answer precision, while on CWQ it achieves new state-of-the-art results among models under 8B parameters in both Hit and F1. Overall, RPO-RAG substantially improves the reasoning capability of small LLMs, even under 3B parameters-highlighting their potential for resource-efficient and practical on-device KGQA applications.
翻译:大型语言模型(LLM)近期展现出卓越的推理能力,但在知识密集型任务上仍存在幻觉问题。检索增强生成(RAG)通过将答案锚定于外部知识源(例如知识图谱(KG))来缓解此问题。然而,现有的基于KG的RAG方法依赖于语义无关的路径采样,且与KG推理目标的弱对齐限制了其准确性的进一步提升。这些方法还将检索到的路径直接输入推理器,而未将其组织成以答案为中心的推理路径,从而阻碍了小型LLM利用检索知识的能力。此外,先前工作主要依赖大型LLM(如ChatGPT/GPT-4)或假设骨干网络参数量超过7B,导致参数量低于7B的模型未被充分探索。据我们所知,我们通过RPO-RAG填补了这一空白,这是首个专为小型LLM设计的基于KG的RAG框架。RPO-RAG引入了三项关键创新:(1)查询-路径语义采样策略,提供信息丰富的监督信号;(2)关系感知偏好优化,使训练与中间KG推理信号(如关系)对齐;(3)以答案为中心的提示设计,以可解释的格式组织实体和推理路径。在两个基准知识图谱问答(KGQA)数据集WebQSP和CWQ上进行的大量实验表明,RPO-RAG有效缩小了小型与大型语言模型之间的性能差距。在WebQSP上,其F1分数最高提升8.8%,体现了答案精度的增强;在CWQ上,其在参数量低于8B的模型中,于Hit和F1指标上均取得了新的最先进结果。总体而言,RPO-RAG显著提升了小型LLM(即使在3B参数量下)的推理能力,凸显了其在资源高效且实用的设备端KGQA应用中的潜力。