Scaling Laws for Reranking in Information Retrieval

Scaling laws have been observed across a wide range of tasks, such as natural language generation and dense retrieval, where performance follows predictable patterns as model size, data, and compute grow. However, these scaling laws are insufficient for understanding the scaling behavior of multi-stage retrieval systems, which typically include a reranking stage. In large-scale multi-stage retrieval systems, reranking is the final and most influential step before presenting a ranked list of items to the end user. In this work, we present the first systematic study of scaling laws for rerankers by analyzing performance across model sizes and data budgets for three popular paradigms: pointwise, pairwise, and listwise reranking. Using a detailed case study with cross-encoder rerankers, we demonstrate that performance follows a predictable power law. This regularity allows us to accurately forecast the performance of larger models for some metrics more than others using smaller-scale experiments, offering a robust methodology for saving significant computational resources. For example, we accurately estimate the NDCG of a 1B-parameter model by training and evaluating only smaller models (up to 400M parameters), in both in-domain as well as out-of-domain settings. Our experiments encompass span several loss functions, models and metrics and demonstrate that downstream metrics like NDCG, MAP (Mean Avg Precision) show reliable scaling behavior and can be forecasted accurately at scale, while highlighting the limitations of metrics like Contrastive Entropy and MRR (Mean Reciprocal Rank) which do not follow predictable scaling behavior in all instances. Our results establish scaling principles for reranking and provide actionable insights for building industrial-grade retrieval systems.

翻译：缩放定律已在自然语言生成和稠密检索等广泛任务中被观察到，其中性能随模型规模、数据量和计算资源的增长遵循可预测的模式。然而，这些缩放定律不足以理解多阶段检索系统的缩放行为，此类系统通常包含重排序阶段。在大规模多阶段检索系统中，重排序是向最终用户呈现排序列表前最后且最具影响力的步骤。在本工作中，我们首次通过分析三种主流范式——逐点、逐对和列表式重排序——在不同模型规模和数据预算下的性能，对重排序器的缩放定律进行了系统性研究。通过使用交叉编码器重排序器的详细案例研究，我们证明了性能遵循可预测的幂律规律。这种规律性使我们能够利用小规模实验，对某些指标（相较于其他指标）更准确地预测更大模型的性能，从而提供一种节省大量计算资源的稳健方法。例如，我们仅通过训练和评估较小模型（最多4亿参数），便准确估计了10亿参数模型在领域内及领域外设置下的NDCG性能。我们的实验涵盖多种损失函数、模型和指标，结果表明下游指标如NDCG、MAP（平均精度均值）展现出可靠的缩放行为并能被准确预测，同时揭示了如对比熵和MRR（平均倒数排名）等指标在部分情况下不遵循可预测缩放行为的局限性。我们的研究结果确立了重排序的缩放原则，并为构建工业级检索系统提供了可操作的见解。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

神经缩放定律的起源：从随机图到自然语言

专知会员服务

14+阅读 · 1月17日

【博士论文】电商搜索中的排序学习

专知会员服务

13+阅读 · 2025年11月15日

视频大模型中视觉上下文表示的scaling law

专知会员服务

24+阅读 · 2024年10月21日

【清华SIGIR2024最佳论文】密集检索扩展定律

专知会员服务

18+阅读 · 2024年7月17日