Retrieval-augmented generation (RAG) has emerged as one of the most prominent applications of vector databases. By integrating documents retrieved from a database into the prompt of a large language model (LLM), RAG enables more reliable and informative content generation. While there has been extensive research on vector databases, many open research problems remain once they are considered in the wider context of end-to-end RAG pipelines. One practical yet challenging problem is how to jointly optimize both system performance and generation quality in RAG, which is significantly more complex than it appears due to the numerous knobs on both the algorithmic side (spanning models and databases) and the systems side (from software to hardware). In this paper, we present RAG-Stack, a three-pillar blueprint for quality-performance co-optimization in RAG systems. RAG-Stack comprises: (1) RAG-IR, an intermediate representation that serves as an abstraction layer to decouple quality and performance aspects; (2) RAG-CM, a cost model for estimating system performance given an RAG-IR; and (3) RAG-PE, a plan exploration algorithm that searches for high-quality, high-performance RAG configurations. We believe this three-pillar blueprint will become the de facto paradigm for RAG quality-performance co-optimization in the years to come.
翻译:检索增强生成(RAG)已成为向量数据库最突出的应用之一。通过将数据库检索到的文档整合到大型语言模型(LLM)的提示中,RAG能够实现更可靠、信息更丰富的内容生成。尽管已有大量关于向量数据库的研究,但当将其置于端到端RAG流程的更广泛背景下考量时,仍存在许多开放的研究问题。其中一个实际且具有挑战性的问题是如何在RAG中联合优化系统性能与生成质量——由于算法侧(涵盖模型与数据库)和系统侧(从软件到硬件)存在大量可调节参数,该问题远比表面看来更为复杂。本文提出RAG-Stack,一个面向RAG系统质量-性能协同优化的三支柱蓝图。RAG-Stack包含:(1)RAG-IR:作为抽象层的中间表示,用于解耦质量与性能维度;(2)RAG-CM:基于给定RAG-IR估算系统性能的成本模型;(3)RAG-PE:探索高质量、高性能RAG配置的方案搜索算法。我们相信这一三支柱蓝图将成为未来数年RAG质量-性能协同优化的事实范式。