Listwise reranking utilizing Large Language Models (LLMs) has achieved state-of-the-art retrieval effectiveness. Recently, reasoning-enhanced models have further pushed these boundaries by employing Chain-of-Thought (CoT) to perform deep comparative analysis of candidate documents. However, this performance gain comes at a prohibitive computational cost, as models often generate thousands of reasoning tokens before producing a final ranking. In this work, we investigate the relationship between reasoning length and ranking quality, revealing an overthinking phenomenon where extended reasoning yields diminishing returns. To address this, we propose a Length-Regularized Self-Distillation framework. We synthesize a dataset by sampling diverse reasoning traces from a teacher model (Rank-K) and applying a Pareto-inspired filter to select traces that achieve high ranking performance with minimal token usage. By fine-tuning on these concise, high-quality rationales, the student model learns to internalize efficient reasoning patterns, effectively pruning redundant deliberation. Experiments on TREC Deep Learning and NeuCLIR benchmarks demonstrate that our method maintains the teacher's effectiveness while reducing inference token consumption by 34%-37% across different retrieval settings, offering a practical solution for deploying reasoning-enhanced rerankers in latency-sensitive applications.
翻译:利用大语言模型(LLMs)进行列表式重排序已取得了最先进的检索效果。近期,推理增强模型通过采用思维链(CoT)对候选文档进行深度比较分析,进一步突破了这些边界。然而,这种性能提升伴随着高昂的计算成本,因为模型在生成最终排序前往往需要生成数千个推理令牌。在本工作中,我们研究了推理长度与排序质量之间的关系,揭示了一种“过度思考”现象,即过长的推理会导致收益递减。为解决此问题,我们提出了一种长度正则化自蒸馏框架。我们通过从教师模型(Rank-K)中采样多样化的推理轨迹,并应用帕累托最优启发式过滤器,选择出以最少令牌使用实现高排序性能的轨迹,从而合成数据集。通过在这些简洁、高质量的理由上进行微调,学生模型学会内化高效的推理模式,有效剪枝冗余推理。在TREC Deep Learning和NeuCLIR基准上的实验表明,我们的方法在不同检索设置下保持教师模型有效性的同时,将推理令牌消耗减少了34%-37%,为在延迟敏感型应用场景中部署推理增强型重排序器提供了实用解决方案。