In this paper, we introduce Rank-R1, a novel LLM-based reranker that performs reasoning over both the user query and candidate documents before performing the ranking task. Existing document reranking methods based on large language models (LLMs) typically rely on prompting or fine-tuning LLMs to order or label candidate documents according to their relevance to a query. For Rank-R1, we use a reinforcement learning algorithm along with only a small set of relevance labels (without any reasoning supervision) to enhance the reasoning ability of LLM-based rerankers. Our hypothesis is that adding reasoning capabilities to the rerankers can improve their relevance assessement and ranking capabilities. Our experiments on the TREC DL and BRIGHT datasets show that Rank-R1 is highly effective, especially for complex queries. In particular, we find that Rank-R1 achieves effectiveness on in-domain datasets at par with that of supervised fine-tuning methods, but utilizing only 18\% of the training data used by the fine-tuning methods. We also find that the model largely outperforms zero-shot and supervised fine-tuning when applied to out-of-domain datasets featuring complex queries, especially when a 14B-size model is used. Finally, we qualitatively observe that Rank-R1's reasoning process improves the explainability of the ranking results, opening new opportunities for search engine results presentation and fruition.
翻译:本文提出Rank-R1,一种新颖的基于大语言模型(LLM)的重排序器,其在执行排序任务前会对用户查询与候选文档进行推理。现有基于大语言模型的文档重排序方法通常依赖提示或微调LLM,以根据文档与查询的相关性对候选文档进行排序或标注。对于Rank-R1,我们采用强化学习算法,并仅使用少量相关性标注(无需任何推理监督)来增强基于LLM的重排序器的推理能力。我们的假设是,为重排序器增加推理能力可以提升其相关性评估与排序性能。在TREC DL和BRIGHT数据集上的实验表明,Rank-R1具有高效性,尤其针对复杂查询。具体而言,我们发现Rank-R1在领域内数据集上达到的效果与监督微调方法相当,但仅使用了微调方法所需训练数据的18%。此外,当应用于包含复杂查询的领域外数据集时,该模型显著优于零样本和监督微调方法,尤其是在使用14B规模模型的情况下。最后,我们通过定性观察发现,Rank-R1的推理过程提升了排序结果的可解释性,为搜索引擎结果的呈现与利用开辟了新的可能性。