Large Language Models (LLMs) have demonstrated remarkable generation capabilities but often struggle to access up-to-date information, which can lead to hallucinations. Retrieval-Augmented Generation (RAG) addresses this issue by incorporating knowledge from external databases, enabling more accurate and relevant responses. Due to the context window constraints of LLMs, it is impractical to input the entire external database context directly into the model. Instead, only the most relevant information, referred to as chunks, is selectively retrieved. However, current RAG research faces three key challenges. First, existing solutions often select each chunk independently, overlooking potential correlations among them. Second, in practice the utility of chunks is non-monotonic, meaning that adding more chunks can decrease overall utility. Traditional methods emphasize maximizing the number of included chunks, which can inadvertently compromise performance. Third, each type of user query possesses unique characteristics that require tailored handling, an aspect that current approaches do not fully consider. To overcome these challenges, we propose a cost constrained retrieval optimization system CORAG for retrieval-augmented generation. We employ a Monte Carlo Tree Search (MCTS) based policy framework to find optimal chunk combinations sequentially, allowing for a comprehensive consideration of correlations among chunks. Additionally, rather than viewing budget exhaustion as a termination condition, we integrate budget constraints into the optimization of chunk combinations, effectively addressing the non-monotonicity of chunk utility.
翻译:大型语言模型(LLM)已展现出卓越的生成能力,但往往难以获取最新信息,这可能导致幻觉问题。检索增强生成(RAG)通过整合外部数据库的知识来解决这一问题,从而生成更准确、更相关的回答。由于LLM存在上下文窗口限制,将整个外部数据库上下文直接输入模型是不现实的。因此,通常仅选择性地检索最相关的信息片段(称为块)。然而,当前的RAG研究面临三个关键挑战。首先,现有解决方案通常独立地选择每个块,忽略了块之间可能存在的关联性。其次,实践中块的效用是非单调的,这意味着增加更多块反而可能降低整体效用。传统方法强调最大化包含块的数量,这可能会无意中损害性能。第三,每种类型的用户查询都具有独特的特征,需要针对性的处理,而当前方法未能充分考虑这一方面。为克服这些挑战,我们提出了一种面向检索增强生成的成本约束检索优化系统CORAG。我们采用基于蒙特卡洛树搜索(MCTS)的策略框架来顺序寻找最优块组合,从而全面考虑块间的相关性。此外,我们并非将预算耗尽视为终止条件,而是将预算约束整合到块组合的优化过程中,有效应对了块效用的非单调性问题。