Large language model (LLM) based listwise reranking has emerged as the dominant paradigm for achieving state-of-the-art ranking effectiveness in information retrieval. However, its reliance on feeding full passage texts into the LLM introduces two critical bottlenecks: the "lost in the middle" phenomenon degrades ranking quality as input length grows, and the inference latency scales super-linearly with sequence length, rendering it impractical for industrial deployment. In this paper, we present ResRank, a unified retrieval-reranking framework that fundamentally addresses both challenges. Inspired by multimodal LLMs that project visual inputs into compact token representations, ResRank employs an Encoder-LLM to compress each candidate passage into a single embedding, which is then fed alongside the query text into a Reranker-LLM for listwise ranking. To alleviate the misalignment between the compressed representation space and the ranking space, we introduce a residual connection structure that combines encoder embeddings with contextualized hidden states from the reranker. Furthermore, we replace the conventional autoregressive decoding with a one-step cosine-similarity-based scoring mechanism, eliminating the generation bottleneck entirely. ResRank is trained through a carefully designed dual-stage, multi-task, end-to-end joint optimization strategy that simultaneously trains the encoder and reranker, achieving learning objective alignment between retrieval and reranking while substantially reducing training complexity. Extensive experiments on TREC Deep Learning and eight BEIR benchmark datasets demonstrate that ResRank achieves competitive or superior ranking effectiveness compared to existing approaches while requiring zero generated tokens and processing only one token per passage, yielding a fundamentally better balance between effectiveness and efficiency.
翻译:基于大语言模型的列表重排序已成为信息检索领域实现最先进排序效果的主导范式。然而,将完整段落文本输入大语言模型会带来两个关键瓶颈:随着输入长度增长,"迷失在中间"现象会降低排序质量,且推理延迟随序列长度超线性增长,使其难以工业部署。本文提出ResRank——一个从根本上解决上述挑战的统一检索-重排序框架。受将视觉输入压缩为紧凑令牌表征的多模态大语言模型启发,ResRank采用编码器-大语言模型将每个候选段落压缩为单个嵌入,再将该嵌入与查询文本共同输入重排序器-大语言模型进行列表排序。为缓解压缩表征空间与排序空间之间的失调,我们引入残差连接结构,将编码器嵌入与重排序器的上下文隐状态进行融合。此外,我们用基于余弦相似度的单步评分机制替代传统自回归解码,完全消除生成瓶颈。ResRank通过精心设计的双阶段、多任务端到端联合优化策略进行训练,该策略同步训练编码器与重排序器,在大幅降低训练复杂度的同时实现了检索与重排序的学习目标对齐。在TREC Deep Learning及八个BEIR基准数据集上的广泛实验表明,ResRank在无需生成任何令牌、每个段落仅需处理一个令牌的条件下,达到了与现有方法相当或更优的排序效果,在效果与效率之间实现了根本性的更优平衡。