Current Retrieval-Augmented Generation (RAG) systems typically employ a traditional two-stage pipeline: an embedding model for initial retrieval followed by a reranker for refinement. However, this paradigm suffers from significant inefficiency due to the lack of shared information between stages, leading to substantial redundant computation. To address this limitation, we propose \textbf{State-Centric Retrieval}, a unified retrieval paradigm that utilizes "states" as a bridge to connect embedding models and rerankers. First, we perform state representation learning by fine-tuning an RWKV-based LLM, transforming it into \textbf{EmbeddingRWKV}, a unified model that serves as both an embedding model and a state backbone for extracting compact, reusable states. Building upon these reusable states, we further design a state-based reranker to fully leverage precomputed information. During reranking, the model processes only query tokens, decoupling inference cost from document length and yielding a 5.4$\times$--44.8$\times$ speedup. Furthermore, we observe that retaining all intermediate layer states is unnecessary; with a uniform layer selection strategy, our model maintains 98.62\% of full-model performance using only 25\% of the layers. Extensive experiments demonstrate that State-Centric Retrieval achieves high-quality retrieval and reranking results while significantly enhancing overall system efficiency. Code is available at \href{https://github.com/howard-hou/EmbeddingRWKV}{our GitHub repository}.
翻译:当前检索增强生成(RAG)系统通常采用传统的两阶段流程:首先使用嵌入模型进行初步检索,随后通过重排序器进行精炼。然而,由于两个阶段之间缺乏信息共享,该范式存在显著的效率低下问题,导致大量冗余计算。为克服这一局限,我们提出**状态中心化检索**,这是一种利用“状态”作为桥梁连接嵌入模型与重排序器的统一检索范式。首先,我们通过对基于RWKV的大语言模型进行微调来实现状态表示学习,将其转化为**EmbeddingRWKV**——一个既可作为嵌入模型,又可作为状态骨干网络以提取紧凑、可复用状态的统一模型。基于这些可复用状态,我们进一步设计了基于状态的重排序器,以充分利用预计算信息。在重排序过程中,模型仅需处理查询词元,使推理成本与文档长度解耦,从而实现5.4倍至44.8倍的加速。此外,我们发现保留所有中间层状态并非必要;通过采用均匀层选择策略,我们的模型仅使用25%的层数即可保持全模型98.62%的性能表现。大量实验表明,状态中心化检索在显著提升系统整体效率的同时,能够实现高质量的检索与重排序结果。代码发布于\href{https://github.com/howard-hou/EmbeddingRWKV}{我们的GitHub仓库}。