Reranking, the process of refining the output from a first-stage retriever, is often considered computationally expensive, especially when using Large Language Models (LLMs). A common approach to mitigate this cost involves utilizing smaller LLMs or controlling input length. Inspired by recent advances in document compression for retrieval-augmented generation (RAG), we introduce RRK, an efficient and effective listwise reranker compressing documents into multi-token fixed-size embedding representations. Our simple training via distillation shows that this combination of rich compressed representations and listwise reranking yields a highly efficient and effective system. In particular, our 8B-parameter model runs 3x-18x faster than smaller rerankers (0.6-4B parameters) while matching or outperforming them in effectiveness. The efficiency gains are even more striking on long-document benchmarks, where RRK widens its advantage further.
翻译:重排序作为优化首阶段检索器输出的关键步骤,通常被认为计算成本高昂,尤其在使用大语言模型(LLMs)时。降低该成本的传统方法包括采用更小规模的LLM或控制输入长度。受近期检索增强生成(RAG)中文档压缩技术的启发,我们提出RRK——一种将文档压缩为固定长度多令牌嵌入表示的高效列表式重排序器。通过蒸馏训练的简易流程表明,这种融合丰富压缩表示与列表式重排序的方法,能够构建兼具高速度与卓越效果的系统。值得注意的是,我们的8B参数模型运行速度可比更小规模重排序器(0.6-4B参数)提升3-18倍,同时保持或超越其有效性。在长文档基准测试中,这种效率优势更为显著,RRK进一步扩大了领先幅度。