Retrieval-Augmented Generation (RAG) has significantly enhanced LLMs by incorporating external information. However, prevailing agentic RAG approaches are constrained by a critical limitation: they treat the retrieval process as a black-box querying operation. This confines agents' actions to query issuing, hindering its ability to tackle complex information-seeking tasks. To address this, we introduce Interact-RAG, a new paradigm that elevates the LLM agent from a passive query issuer into an active manipulator of the retrieval process. We dismantle the black-box with a Corpus Interaction Engine, equipping the agent with a set of action primitives for fine-grained control over information retrieval. To further empower the agent on the entire RAG pipeline, we first develop a reasoning-enhanced workflow, which enables both zero-shot execution and the synthesis of interaction trajectories. We then leverage this synthetic data to train a fully autonomous end-to-end agent via Supervised Fine-Tuning (SFT), followed by refinement with Reinforcement Learning (RL). Extensive experiments across six benchmarks demonstrate that Interact-RAG significantly outperforms other advanced methods, validating the efficacy of our reasoning-interaction strategy.
翻译:检索增强生成(RAG)通过整合外部信息显著增强了大型语言模型(LLM)的能力。然而,当前主流的智能体式RAG方法存在一个关键局限:它们将检索过程视为黑盒查询操作。这限制了智能体的行为仅为发起查询,阻碍了其处理复杂信息检索任务的能力。为解决这一问题,我们提出了Interact-RAG,这是一种新范式,将LLM智能体从被动的查询发起者提升为检索过程的主动操控者。我们通过构建语料库交互引擎打破了黑盒限制,为智能体提供了一组基础操作原语,以实现对信息检索的细粒度控制。为进一步增强智能体在整个RAG流程中的能力,我们首先开发了一种推理增强的工作流,该流程支持零样本执行与交互轨迹的合成。随后,我们利用这些合成数据,通过监督微调(SFT)训练出一个完全自主的端到端智能体,并采用强化学习(RL)进行优化。在六个基准测试上的大量实验表明,Interact-RAG显著优于其他先进方法,验证了我们提出的推理-交互策略的有效性。