Standard RAG pipelines based on chunking excel at simple factual retrieval but fail on complex multi-hop queries due to a lack of structural connectivity. Conversely, initial strategies that interleave retrieval with reasoning often lack global corpus awareness, while Knowledge Graph (KG)-based RAG performs strongly on complex multi-hop tasks but suffers on fact-oriented single-hop queries. To bridge this gap, we propose a novel RAG framework: ToPG (Traversal over Proposition Graphs). ToPG models its knowledge base as a heterogeneous graph of propositions, entities, and passages, effectively combining the granular fact density of propositions with graph connectivity. We leverage this structure using iterative Suggestion-Selection cycles, where the Suggestion phase enables a query-aware traversal of the graph, and the Selection phase provides LLM feedback to prune irrelevant propositions and seed the next iteration. Evaluated on three distinct QA tasks (Simple, Complex, and Abstract QA), ToPG demonstrates strong performance across both accuracy- and quality-based metrics. Overall, ToPG shows that query-aware graph traversal combined with factual granularity is a critical component for efficient structured RAG systems. ToPG is available at https://github.com/idiap/ToPG.
翻译:基于分块的标准RAG流水线擅长简单事实检索,但由于缺乏结构连通性,在处理复杂多跳查询时表现不佳。相反,早期将检索与推理交错进行的策略通常缺乏全局语料库感知能力,而基于知识图谱(KG)的RAG虽然在复杂多跳任务上表现强劲,却在面向事实的单跳查询中表现欠佳。为弥合这一差距,我们提出了一种新颖的RAG框架:ToPG(基于命题图的遍历)。ToPG将其知识库建模为命题、实体和段落的异构图,有效结合了命题的细粒度事实密度与图连通性。我们通过迭代的“建议-选择”循环利用此结构:建议阶段实现查询感知的图遍历,选择阶段则提供LLM反馈以剪除无关命题并为下一次迭代提供种子。在三种不同的QA任务(简单、复杂和抽象QA)上的评估表明,ToPG在基于准确性和质量的指标上均表现出强劲性能。总体而言,ToPG证明查询感知的图遍历与事实细粒度的结合是构建高效结构化RAG系统的关键要素。ToPG可在https://github.com/idiap/ToPG获取。