As retrieval-augmented generation (RAG) becomes increasingly widespread, the role of information retrieval (IR) is shifting from retrieving information for human users to retrieving contextual knowledge for artificial intelligence (AI) systems, where relevance becomes difficult to define or annotate beforehand. To address this challenge, we propose R3, a Retrieval framework optimized for RAG through trialand-feedback Reinforced contrastive learning. Unlike prior approaches that rely on annotated or synthetic data for supervised fine-tuning, R3 enables the retriever to dynamically explore and optimize relevance within the RAG environment. During training, the retrieved results interact with the environment to produce contrastive signals that automatically guide the retriever's self-improvement. Extensive experiments across diverse tasks demonstrate that R3 improves RAG performance by 5.2% over the original retriever and surpasses state-of-the-art retrievers by 4.9%, while achieving comparable results to LLM-augmented retrieval and RAG systems built on post-trained or instruction-tuned LLMs. It is both efficient and practical, requiring only 4 GPUs and completing training within a single day.
翻译:随着检索增强生成(RAG)的日益普及,信息检索(IR)的角色正从为人类用户检索信息转变为为人工智能(AI)系统检索上下文知识,其中相关性难以预先定义或标注。为应对这一挑战,我们提出R3,一个通过试错反馈强化对比学习为RAG优化的检索框架。与先前依赖标注或合成数据进行监督微调的方法不同,R3使检索器能够在RAG环境中动态探索和优化相关性。在训练过程中,检索结果与环境交互产生对比信号,自动引导检索器的自我改进。跨多种任务的广泛实验表明,R3将RAG性能较原始检索器提升5.2%,并超越最先进检索器4.9%,同时达到与基于后训练或指令调优大语言模型(LLM)的LLM增强检索及RAG系统相当的结果。该方法高效且实用,仅需4个GPU并在单日内完成训练。