Retrieval Augmented Generation (RAG) has become the standard approach for equipping Large Language Models (LLMs) with up-to-date knowledge. However, standard RAG, relying on independent passage retrieval, often fails to capture the interconnected nature of information required for complex, multi-hop reasoning. While structured RAG methods attempt to address this using knowledge graphs built from triples, we argue that the inherent context loss of triples (context collapse) limits the fidelity of the knowledge representation. We introduce PropRAG, a novel RAG framework that shifts from triples to context-rich propositions and introduces an efficient, LLM-free online beam search over proposition paths to discover multi-step reasoning chains. By coupling a higher-fidelity knowledge representation with explicit path discovery, PropRAG achieves state-of-the-art zero-shot Recall@5 and F1 scores on 2Wiki, HotpotQA, and MuSiQue, advancing non-parametric knowledge integration by improving evidence retrieval through richer representation and efficient reasoning path discovery.
翻译:检索增强生成(RAG)已成为为大型语言模型(LLM)配备最新知识的标准方法。然而,依赖独立段落检索的标准RAG方法通常难以捕捉复杂多跳推理所需的信息关联特性。尽管结构化RAG方法尝试使用基于三元组构建的知识图谱来解决此问题,但我们认为三元组固有的上下文丢失(上下文坍缩)限制了知识表示的保真度。本文提出PropRAG——一种创新的RAG框架,该框架从三元组转向上下文丰富的命题表示,并引入一种高效的、无需LLM参与的在线命题路径束搜索机制,以发现多步推理链。通过将高保真知识表示与显式路径发现相结合,PropRAG在2Wiki、HotpotQA和MuSiQue数据集上实现了零样本Recall@5和F1分数的领先性能,通过更丰富的表示和高效的推理路径发现改进了证据检索,从而推动了非参数化知识集成领域的发展。