Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation

In Greek mythology, Pistis symbolized good faith, trust, and reliability. Drawing inspiration from these principles, Pistis-RAG is a scalable multi-stage framework designed to address the challenges of large-scale retrieval-augmented generation (RAG) systems. This framework consists of distinct stages: matching, pre-ranking, ranking, reasoning, and aggregating. Each stage contributes to narrowing the search space, prioritizing semantically relevant documents, aligning with the large language model's (LLM) preferences, supporting complex chain-of-thought (CoT) methods, and combining information from multiple sources. Our ranking stage introduces a significant innovation by recognizing that semantic relevance alone may not lead to improved generation quality, due to the sensitivity of the few-shot prompt order, as noted in previous research. This critical aspect is often overlooked in current RAG frameworks. We argue that the alignment issue between LLMs and external knowledge ranking methods is tied to the model-centric paradigm dominant in RAG systems. We propose a content-centric approach, emphasizing seamless integration between LLMs and external information sources to optimize content transformation for specific tasks. Our novel ranking stage is designed specifically for RAG systems, incorporating principles of information retrieval while considering the unique business scenarios reflected in LLM preferences and user feedback. We simulated feedback signals on the MMLU benchmark, resulting in a 9.3% performance improvement. Our model and code will be open-sourced on GitHub. Additionally, experiments on real-world, large-scale data validate the scalability of our framework.

翻译：在希腊神话中，Pistis象征着诚信、信任与可靠。受这些原则启发，Pistis-RAG是一个可扩展的多阶段框架，旨在应对大规模检索增强生成（RAG）系统的挑战。该框架由匹配、预排序、排序、推理和聚合等独立阶段构成。每个阶段分别致力于缩小搜索空间、优先处理语义相关文档、对齐大语言模型（LLM）偏好、支持复杂思维链（CoT）方法以及融合多源信息。我们的排序阶段引入了一项重要创新：基于先前研究指出的少样本提示顺序敏感性，我们认识到仅凭语义相关性可能无法提升生成质量。这一关键问题在当前RAG框架中常被忽视。我们认为LLM与外部知识排序方法之间的对齐问题，与RAG系统中占主导地位的模型中心范式密切相关。我们提出以内容为中心的方法，强调LLM与外部信息源的无缝集成，以针对特定任务优化内容转换。我们新颖的排序阶段专为RAG系统设计，既融合信息检索原理，又充分考虑LLM偏好和用户反馈所反映的独特业务场景。通过在MMLU基准上模拟反馈信号，我们实现了9.3%的性能提升。我们的模型与代码将在GitHub开源。此外，基于真实世界大规模数据的实验验证了本框架的可扩展性。