Full Stage Learning to Rank: A Unified Framework for Multi-Stage Systems

The Probability Ranking Principle (PRP) has been considered as the foundational standard in the design of information retrieval (IR) systems. The principle requires an IR module's returned list of results to be ranked with respect to the underlying user interests, so as to maximize the results' utility. Nevertheless, we point out that it is inappropriate to indiscriminately apply PRP through every stage of a contemporary IR system. Such systems contain multiple stages (e.g., retrieval, pre-ranking, ranking, and re-ranking stages, as examined in this paper). The \emph{selection bias} inherent in the model of each stage significantly influences the results that are ultimately presented to users. To address this issue, we propose an improved ranking principle for multi-stage systems, namely the Generalized Probability Ranking Principle (GPRP), to emphasize both the selection bias in each stage of the system pipeline as well as the underlying interest of users. We realize GPRP via a unified algorithmic framework named Full Stage Learning to Rank. Our core idea is to first estimate the selection bias in the subsequent stages and then learn a ranking model that best complies with the downstream modules' selection bias so as to deliver its top ranked results to the final ranked list in the system's output. We performed extensive experiment evaluations of our developed Full Stage Learning to Rank solution, using both simulations and online A/B tests in one of the leading short-video recommendation platforms. The algorithm is proved to be effective in both retrieval and ranking stages. Since deployed, the algorithm has brought consistent and significant performance gain to the platform.

翻译：概率排序原则（PRP）一直被视为信息检索系统设计的基石。该原则要求信息检索模块返回的结果列表需根据用户潜在兴趣进行排序，以最大化结果效用。然而，我们指出，在当代信息检索系统的每个阶段不加区分地应用PRP是不恰当的。这类系统包含多个阶段（例如本文研究的检索、预排序、排序和重排序阶段），每个阶段模型固有的**选择偏差**会显著影响最终呈现给用户的结果。为解决此问题，我们提出一种适用于多阶段系统的改进排序原则——广义概率排序原则（GPRP），以同时强调系统流程中各阶段的选择偏差与用户的潜在兴趣。我们通过名为“全文学习排序”的统一算法框架实现了GPRP。核心思路是：首先估计后续阶段中的选择偏差，然后学习一个最符合下游模块选择偏差的排序模型，使其排名最高的结果能够进入系统输出的最终排序列表。我们在领先的短视频推荐平台上，通过仿真实验和在线A/B测试对开发的全局学习排序方案进行了广泛评估。该算法在检索和排序阶段均被证明有效。自部署以来，该算法为平台带来了持续且显著的性能提升。