Recent studies have demonstrated the effectiveness of using large language language models (LLMs) in passage ranking. The listwise approaches, such as RankGPT, have become new state-of-the-art in this task. However, the efficiency of RankGPT models is limited by the maximum context length and relatively high latency of LLM inference. To address these issues, in this paper, we propose PE-Rank, leveraging the single passage embedding as a good context compression for efficient listwise passage reranking. By treating each passage as a special token, we can directly input passage embeddings into LLMs, thereby reducing input length. Additionally, we introduce an inference method that dynamically constrains the decoding space to these special tokens, accelerating the decoding process. For adapting the model to reranking, we employ listwise learning to rank loss for training. Evaluation results on multiple benchmarks demonstrate that PE-Rank significantly improves efficiency in both prefilling and decoding, while maintaining competitive ranking effectiveness. {The Code is available at \url{https://github.com/liuqi6777/pe_rank}.}
翻译:近期研究表明,大语言模型(LLMs)在段落排序任务中展现出显著效果。以RankGPT为代表的列表式方法已成为该任务的最新最优技术。然而,RankGPT模型的效率受限于LLM推理的最大上下文长度和较高延迟。为应对这些问题,本文提出PE-Rank方法,通过利用单段落嵌入作为高效的上下文压缩表示来实现列表式段落重排序。通过将每个段落视为特殊标记,我们可以直接将段落嵌入输入LLM,从而减少输入长度。此外,我们提出一种动态约束解码空间至这些特殊标记的推理方法,以加速解码过程。为使模型适配重排序任务,我们采用列表式学习排序损失进行训练。在多个基准测试上的评估结果表明,PE-Rank在预填充和解码阶段均显著提升效率,同时保持具有竞争力的排序性能。{代码开源地址:\url{https://github.com/liuqi6777/pe_rank}。}