ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability

Large Language Model (LLM) based listwise ranking has shown superior performance in many passage ranking tasks. With the development of Large Reasoning Models (LRMs), many studies have demonstrated that step-by-step reasoning during test-time helps improve listwise ranking performance. However, due to the scarcity of reasoning-intensive training data, existing rerankers perform poorly in many complex ranking scenarios, and the ranking ability of reasoning-intensive rerankers remains largely underdeveloped. In this paper, we first propose an automated reasoning-intensive training data synthesis framework, which sources training queries and passages from diverse domains and applies DeepSeek-R1 to generate high-quality training labels. To empower the listwise reranker with strong reasoning ability, we further propose a two-stage training approach, which includes a cold-start supervised fine-tuning (SFT) stage and a reinforcement learning (RL) stage. During the RL stage, we design a novel multi-view ranking reward tailored to the multi-turn nature of listwise ranking. Extensive experiments demonstrate that our trained reasoning-intensive reranker \textbf{ReasonRank} outperforms existing baselines significantly and also achieves much lower latency than the pointwise reranker. Our codes are available at https://github.com/8421BCD/ReasonRank.

翻译：基于大语言模型（LLM）的列表式排序方法已在多项段落排序任务中展现出优越性能。随着大型推理模型（LRM）的发展，大量研究表明在测试时进行逐步推理有助于提升列表式排序性能。然而，由于缺乏推理密集型训练数据，现有重排序器在诸多复杂排序场景下表现欠佳，且推理密集型重排序器的排序能力仍待充分挖掘。本文首先提出一种自动化推理密集型训练数据合成框架，该框架从多领域获取训练查询与段落，并应用DeepSeek-R1生成高质量训练标签。为赋予列表式重排序器强大的推理能力，我们进一步提出两阶段训练方法，包含冷启动监督微调（SFT）阶段与强化学习（RL）阶段。在RL阶段，我们针对列表式排序的多轮交互特性，设计了一种新颖的多视角排序奖励机制。大量实验表明，我们训练的推理密集型重排序器\textbf{ReasonRank}显著优于现有基线模型，且延迟远低于逐点式重排序器。我们的代码已开源至https://github.com/8421BCD/ReasonRank。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

小型推理模型简要综述：训练、推理、应用与研究方向

专知会员服务

42+阅读 · 2025年4月16日

142页DeepSeek-R1 思维链技术：让我们一起<思考>大语言模型（LLM）的推理能力

专知会员服务

48+阅读 · 2025年4月12日

大规模推理模型的高效推理：综述

专知会员服务

21+阅读 · 2025年4月3日

通过逻辑推理赋能大语言模型：综述

专知会员服务

32+阅读 · 2025年2月24日