ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking

Reranking is fundamental to information retrieval and retrieval-augmented generation, with recent Large Language Models (LLMs) significantly advancing reranking quality. Most current works rely on large-scale LLMs (>7B parameters), presenting high computational costs. Small Language Models (SLMs) offer a promising alternative because of computational efficiency. However, our preliminary quantitative analysis reveals key limitations of SLMs: their representation space is narrow, leading to reduced expressiveness, and they struggle with understanding task prompts without fine-tuning. To address these issues, we introduce a novel two-stage training approach, ProRank, for SLM-based document reranking. We propose using reinforcement learning to improve the understanding of task prompts. Additionally, we introduce fine-grained score learning to enhance representation expressiveness and further improve document reranking quality. Extensive experiments suggest that ProRank consistently outperforms both the most advanced open-source and proprietary reranking models. Notably, our 0.5B ProRank even surpasses powerful LLM reranking models on the BEIR benchmark, establishing that properly trained SLMs can achieve superior document reranking performance while maintaining computational efficiency.

翻译：重排序是信息检索与检索增强生成中的基础环节，近年来大语言模型（LLMs）显著提升了重排序质量。当前大多数工作依赖大规模LLM（参数超过7B），导致计算成本高昂。小语言模型（SLMs）凭借计算高效性成为有潜力的替代方案。然而，我们的定量预分析揭示了SLMs的关键局限性：其表示空间狭窄导致表达能力不足，且未经微调时难以理解任务提示。针对这些问题，我们提出了一种名为ProRank的两阶段训练方法，用于基于SLM的文档重排序。该方法通过强化学习提升对任务提示的理解能力，并引入细粒度分数学习增强表示表达能力，进一步优化文档重排序质量。大量实验表明，ProRank在性能上始终优于最先进的开源和商业重排序模型。值得注意的是，我们的0.5B参数ProRank甚至在BEIR基准测试中超越了强大的LLM重排序模型，证明了经过适当训练的SLM能在保持计算效率的同时实现卓越的文档重排序性能。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

什么是后训练？大语言模型训练后优化方法综述，87页pdf

专知会员服务

54+阅读 · 2025年3月11日

大语言模型在序列推荐中的应用

专知会员服务

19+阅读 · 2024年11月12日

大型语言模型增强强化学习综述:概念、分类和方法

专知会员服务

57+阅读 · 2024年4月4日

RAG+LLM=？同济大学等最新《大型语言模型的检索增强生成》综述

专知会员服务

111+阅读 · 2023年12月19日