We revisit retrieval-augmented generation (RAG) by embedding retrieval control directly into generation. Instead of treating retrieval as an external intervention, we express retrieval decisions within token-level decoding, enabling end-to-end coordination without additional controllers or classifiers. Under the paradigm of Retrieval as Generation, we propose \textbf{GRIP} (\textbf{G}eneration-guided \textbf{R}etrieval with \textbf{I}nformation \textbf{P}lanning), a unified framework in which the model regulates retrieval behavior through control-token emission. Central to GRIP is \textit{Self-Triggered Information Planning}, which allows the model to decide when to retrieve, how to reformulate queries, and when to terminate, all within a single autoregressive trajectory. This design tightly couples retrieval and reasoning and supports dynamic multi-step inference with on-the-fly evidence integration. To supervise these behaviors, we construct a structured training set covering answerable, partially answerable, and multi-hop queries, each aligned with specific token patterns. Experiments on five QA benchmarks show that GRIP surpasses strong RAG baselines and is competitive with GPT-4o while using substantially fewer parameters.
翻译:我们重新审视检索增强生成(RAG),将检索控制直接嵌入到生成过程中。不同于将检索视为外部干预,我们将检索决策表达为词元级解码的一部分,从而无需额外控制器或分类器即可实现端到端协调。在“检索即生成”范式下,我们提出**GRIP**(**G**eneration-guided **R**etrieval with **I**nformation **P**lanning,生成引导的检索与信息规划),这是一个统一框架,模型通过控制词元发射来调节检索行为。GRIP的核心是**自触发信息规划**,它使模型能够在单一自回归轨迹中自主决定何时检索、如何重构查询以及何时终止。该设计紧密耦合检索与推理,并支持动态多步推理与实时证据整合。为监督这些行为,我们构建了一个结构化训练集,涵盖可回答、部分可回答及多跳查询,每种类型均对应特定词元模式。在五个问答基准上的实验表明,GRIP超越强RAG基线,并在参数规模大幅缩减的情况下与GPT-4o性能相当。