Stop Overthinking: Unlocking Efficient Listwise Reranking with Minimal Reasoning

Listwise reranking utilizing Large Language Models (LLMs) has achieved state-of-the-art retrieval effectiveness. Recently, reasoning-enhanced models have further pushed these boundaries by employing Chain-of-Thought (CoT) to perform deep comparative analysis of candidate documents. However, this performance gain comes at a prohibitive computational cost, as models often generate thousands of reasoning tokens before producing a final ranking. In this work, we investigate the relationship between reasoning length and ranking quality, revealing an overthinking phenomenon where extended reasoning yields diminishing returns. To address this, we propose a Length-Regularized Self-Distillation framework. We synthesize a dataset by sampling diverse reasoning traces from a teacher model (Rank-K) and applying a Pareto-inspired filter to select traces that achieve high ranking performance with minimal token usage. By fine-tuning on these concise, high-quality rationales, the student model learns to internalize efficient reasoning patterns, effectively pruning redundant deliberation. Experiments on TREC Deep Learning and NeuCLIR benchmarks demonstrate that our method maintains the teacher's effectiveness while reducing inference token consumption by 34%-37% across different retrieval settings, offering a practical solution for deploying reasoning-enhanced rerankers in latency-sensitive applications.

翻译：利用大语言模型（LLMs）进行列表式重排序已取得了最先进的检索效果。近期，推理增强模型通过采用思维链（CoT）对候选文档进行深度比较分析，进一步突破了这些边界。然而，这种性能提升伴随着高昂的计算成本，因为模型在生成最终排序前往往需要生成数千个推理令牌。在本工作中，我们研究了推理长度与排序质量之间的关系，揭示了一种“过度思考”现象，即过长的推理会导致收益递减。为解决此问题，我们提出了一种长度正则化自蒸馏框架。我们通过从教师模型（Rank-K）中采样多样化的推理轨迹，并应用帕累托最优启发式过滤器，选择出以最少令牌使用实现高排序性能的轨迹，从而合成数据集。通过在这些简洁、高质量的理由上进行微调，学生模型学会内化高效的推理模式，有效剪枝冗余推理。在TREC Deep Learning和NeuCLIR基准上的实验表明，我们的方法在不同检索设置下保持教师模型有效性的同时，将推理令牌消耗减少了34%-37%，为在延迟敏感型应用场景中部署推理增强型重排序器提供了实用解决方案。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【ICLR2026】缩放推理步数暴露短板：揭示并提升大语言模型中的步数泛化能力

专知会员服务

10+阅读 · 2月1日

大语言模型中的隐式推理：综合综述

专知会员服务

34+阅读 · 2025年9月4日

《潜在推理综述》

专知会员服务

21+阅读 · 2025年7月9日

超越语言的推理：潜在思维链推理的综合综述

专知会员服务

22+阅读 · 2025年5月23日