Retrieval-augmented generation (RAG) enables large language models (LLMs) to produce evidence-based responses, and its performance hinges on the matching between the retriever and LLMs. Retriever optimization has emerged as an efficient alternative to fine-tuning LLMs. However, existing solutions suffer from objective mismatch between retriever optimization and the goal of RAG pipeline. Reinforcement learning (RL) provides a promising solution to address this limitation, yet applying RL to retriever optimization introduces two fundamental challenges: 1) the deterministic retrieval is incompatible with RL formulations, and 2) state aliasing arises from query-only retrieval in multi-hop reasoning. To address these challenges, we replace deterministic retrieval with stochastic sampling and formulate RAG as a Markov decision process, making retriever optimizable by RL. Further, we incorporate retrieval history into the state at each retrieval step to mitigate state aliasing. Extensive experiments across diverse RAG pipelines, datasets, and retriever scales demonstrate consistent improvements of our approach in RAG performance.
翻译:检索增强生成(RAG)使大语言模型(LLM)能够生成基于证据的响应,其性能取决于检索器与LLM之间的匹配程度。检索器优化已成为微调LLM的一种高效替代方案。然而,现有解决方案存在检索器优化目标与RAG流程最终目标不匹配的问题。强化学习(RL)为解决这一局限提供了可行路径,但将RL应用于检索器优化会引入两个根本性挑战:1)确定性检索机制与RL框架不兼容;2)在多跳推理中,仅基于查询的检索会导致状态混淆。为应对这些挑战,我们采用随机采样替代确定性检索,并将RAG建模为马尔可夫决策过程,从而使检索器可通过RL进行优化。此外,我们在每个检索步骤中将检索历史纳入状态表示,以缓解状态混淆问题。通过在多样化RAG流程、数据集及不同规模检索器上的大量实验,本方法在RAG性能上均展现出持续改进效果。