Retrieval-augmented generation (RAG) systems commonly improve robustness via query-time adaptations such as query expansion and iterative retrieval. While effective, these approaches are inherently stateless: adaptations are recomputed for each query and discarded thereafter, precluding cumulative learning and repeatedly incurring inference-time cost. Index-side approaches like key expansion introduce persistence but rely on offline preprocessing or heuristic updates that are weakly aligned with downstream task utility, leading to semantic drift and noise accumulation. We propose Evolving Retrieval Memory (ERM), a training-free framework that transforms transient query-time gains into persistent retrieval improvements. ERM updates the retrieval index through correctness-gated feedback, selectively attributes atomic expansion signals to the document keys they benefit, and progressively evolves keys via stable, norm-bounded updates. We show that query and key expansion are theoretically equivalent under standard similarity functions and prove convergence of ERM's selective updates, amortizing optimal query expansion into a stable index with zero inference-time overhead. Experiments on BEIR and BRIGHT across 13 domains demonstrate consistent gains in retrieval and generation, particularly on reasoning-intensive tasks, at native retrieval speed.
翻译:检索增强生成系统通常通过查询时自适应(如查询扩展与迭代检索)来提升鲁棒性。这些方法虽有效,但本质上是无状态的:每次查询都需重新计算自适应策略且随后丢弃,既无法实现累积学习,又需反复承担推理开销。索引侧方法(如关键扩展)虽具备持久性,但依赖于离线预处理或启发式更新,这些更新与下游任务效用弱对齐,易导致语义漂移与噪声累积。本文提出演化检索记忆,一种免训练框架,可将瞬时的查询时增益转化为持久的检索改进。该框架通过正确性门控反馈更新检索索引,将原子扩展信号选择性地归因于其受益的文档关键向量,并通过稳定、范数有界的更新逐步演化关键向量。我们证明在标准相似度函数下查询扩展与关键扩展理论等价,并验证了选择性更新机制的收敛性,从而将最优查询扩展摊销至稳定索引中,实现零推理开销。在涵盖13个领域的BEIR与BRIGHT数据集上的实验表明,该方法在检索与生成任务中均取得稳定提升,尤其在推理密集型任务上表现显著,且保持原生检索速度。