CARROT: A Learned Cost-Constrained Retrieval Optimization System for RAG

from arxiv, Accepted to ICDE 2026. Updated title (previously "CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation")

Large Language Models (LLMs) have demonstrated impressive ability in generation and reasoning tasks but struggle with handling up-to-date knowledge, leading to inaccuracies or hallucinations. Retrieval-Augmented Generation (RAG) mitigates this by retrieving and incorporating external knowledge into input prompts. In particular, due to LLMs' context window limitations and long-context hallucinations, only the most relevant "chunks" are retrieved. However, current RAG systems face three key challenges: (1) chunks are often retrieved independently without considering their relationships, such as redundancy and ordering; (2) the utility of chunks is non-monotonic, as adding more chunks can degrade quality; and (3) retrieval strategies fail to adapt to the unique characteristics of different queries. To overcome these challenges, we design a cost-constrained retrieval optimization framework for RAG. We adopt a Monte Carlo Tree Search (MCTS) based strategy to find the optimal chunk combination order, which considers the chunks' correlations. In addition, to address the non-monotonicity of chunk utility, instead of treating budget exhaustion as the termination condition, we design a utility computation strategy to identify the optimal chunk combination without necessarily exhausting the budget. Furthermore, we propose a configuration agent that predicts optimal configurations for each query domain, improving our framework's adaptability and efficiency. Experimental results demonstrate up to a 30% improvement over baseline models, highlighting the framework's effectiveness, scalability, and suitability. Our source code has been released at https://github.com/wang0702/CARROT.

翻译：大型语言模型（LLMs）在生成和推理任务中展现出卓越能力，但在处理最新知识时存在困难，容易导致不准确或产生幻觉。检索增强生成（RAG）通过检索外部知识并将其整合到输入提示中来缓解这一问题。特别是由于LLMs的上下文窗口限制和长上下文幻觉问题，通常仅检索最相关的“文本块”。然而，当前RAG系统面临三个关键挑战：（1）文本块通常被独立检索，未考虑其相互关系，如冗余性和顺序性；（2）文本块的效用具有非单调性，增加更多文本块可能降低质量；（3）检索策略未能适应不同查询的独特特性。为克服这些挑战，我们设计了一种成本约束的检索优化框架用于RAG。我们采用基于蒙特卡洛树搜索（MCTS）的策略来寻找最优的文本块组合顺序，该策略考虑了文本块之间的相关性。此外，针对文本块效用的非单调性，我们设计了效用计算策略，以识别最优文本块组合，而非将预算耗尽作为终止条件。进一步地，我们提出了一个配置代理，用于预测每个查询领域的最优配置，从而提升框架的适应性和效率。实验结果表明，相较于基线模型，我们的框架实现了高达30%的性能提升，突显了其有效性、可扩展性和适用性。我们的源代码已在https://github.com/wang0702/CARROT发布。