Large Language Models (LLMs) have demonstrated impressive ability in generation and reasoning tasks but struggle with handling up-to-date knowledge, leading to inaccuracies or hallucinations. Retrieval-Augmented Generation (RAG) mitigates this by retrieving and incorporating external knowledge into input prompts. In particular, due to LLMs' context window limitations and long-context hallucinations, only the most relevant "chunks" are retrieved. However, current RAG systems face three key challenges: (1) chunks are often retrieved independently without considering their relationships, such as redundancy and ordering; (2) the utility of chunks is non-monotonic, as adding more chunks can degrade quality; and (3) retrieval strategies fail to adapt to the unique characteristics of different queries. To overcome these challenges, we design a cost-constrained retrieval optimization framework for RAG. We adopt a Monte Carlo Tree Search (MCTS) based strategy to find the optimal chunk combination order, which considers the chunks' correlations. In addition, to address the non-monotonicity of chunk utility, instead of treating budget exhaustion as the termination condition, we design a utility computation strategy to identify the optimal chunk combination without necessarily exhausting the budget. Furthermore, we propose a configuration agent that predicts optimal configurations for each query domain, improving our framework's adaptability and efficiency. Experimental results demonstrate up to a 30% improvement over baseline models, highlighting the framework's effectiveness, scalability, and suitability. Our source code has been released at https://github.com/wang0702/CARROT.
翻译:大型语言模型(LLMs)在生成和推理任务中展现出卓越能力,但在处理最新知识时存在困难,容易导致不准确或产生幻觉。检索增强生成(RAG)通过检索外部知识并将其整合到输入提示中来缓解这一问题。特别是由于LLMs的上下文窗口限制和长上下文幻觉问题,通常仅检索最相关的“文本块”。然而,当前RAG系统面临三个关键挑战:(1)文本块通常被独立检索,未考虑其相互关系,如冗余性和顺序性;(2)文本块的效用具有非单调性,增加更多文本块可能降低质量;(3)检索策略未能适应不同查询的独特特性。为克服这些挑战,我们设计了一种成本约束的检索优化框架用于RAG。我们采用基于蒙特卡洛树搜索(MCTS)的策略来寻找最优的文本块组合顺序,该策略考虑了文本块之间的相关性。此外,针对文本块效用的非单调性,我们设计了效用计算策略,以识别最优文本块组合,而非将预算耗尽作为终止条件。进一步地,我们提出了一个配置代理,用于预测每个查询领域的最优配置,从而提升框架的适应性和效率。实验结果表明,相较于基线模型,我们的框架实现了高达30%的性能提升,突显了其有效性、可扩展性和适用性。我们的源代码已在https://github.com/wang0702/CARROT发布。