Retrieval Augmented Generation (RAG) is a promising technique for mitigating two key limitations of large language models (LLMs): outdated information and hallucinations. RAG system stores documents as embedding vectors in a database. Given a query, search is executed to find the most related documents. Then, the topmost matching documents are inserted into LLMs' prompt to generate a response. Efficient and accurate searching is critical for RAG to get relevant information. We propose a cost-effective searching algorithm for retrieval process. Our progressive searching algorithm incrementally refines the candidate set through a hierarchy of searches, starting from low-dimensional embeddings and progressing into a higher, target-dimensionality. This multi-stage approach reduces retrieval time while preserving the desired accuracy. Our findings demonstrate that progressive search in RAG systems achieves a balance between dimensionality, speed, and accuracy, enabling scalable and high-performance retrieval even for large databases.
翻译:检索增强生成(RAG)是一种有前景的技术,用于缓解大型语言模型(LLMs)的两个关键局限:信息过时与幻觉问题。RAG系统将文档存储为数据库中的嵌入向量。给定查询时,系统执行搜索以查找最相关的文档,随后将匹配度最高的文档插入LLMs的提示中以生成响应。高效且准确的搜索对于RAG获取相关信息至关重要。本文提出一种经济高效的检索过程搜索算法。我们的渐进式搜索算法通过分层搜索逐步优化候选集,从低维嵌入开始,逐步过渡到更高的目标维度。这种多阶段方法在保持预期准确率的同时减少了检索时间。实验结果表明,RAG系统中的渐进式搜索实现了维度、速度与准确率之间的平衡,即使面对大规模数据库也能实现可扩展的高性能检索。