For industrial learning-to-rank (LTR) systems, it is common that the output of a ranking model is modified, either as a results of post-processing logic that enforces business requirements, or as a result of unforeseen design flaws or bugs present in real-world production systems. This poses a challenge for deploying off-policy learning and evaluation methods, as these often rely on the assumption that rankings implied by the model's scores coincide with displayed items to the users. Further requirements for reliable offline evaluation are proper randomization and correct estimation of the propensities of displaying each item in any given position of the ranking, which are also impacted by the aforementioned post-processing. We investigate empirically how these scenarios impair off-policy evaluation for learning-to-rank models. We then propose a novel correction method based on the Birkhoff-von-Neumann decomposition that is robust to this type of post-processing. We obtain more accurate off-policy estimates in offline experiments, overcoming the problem of post-processed rankings. To the best of our knowledge this is the first study on the impact of real-world business rules on offline evaluation of LTR models.
翻译:对于工业级的排序学习系统,常见情况是排序模型的输出会受到修改,要么是由于执行业务需求的后处理逻辑所致,要么是由于现实生产系统中存在的意外设计缺陷或漏洞所致。这给离策略学习与评估方法的部署带来了挑战,因为这些方法通常依赖于一个假设:模型评分所隐含的排序结果与向用户展示的项目一致。可靠离线评估的进一步要求包括适当的随机化,以及正确估计每个项目在排序中任意给定位置上被展示的倾向性——这些同样受到前述后处理的影响。我们通过实证研究探讨了这些场景如何削弱排序学习模型的离策略评估效果。随后,我们基于Birkhoff-von-Neumann分解提出了一种新颖的修正方法,该方法对此类后处理具有鲁棒性。在离线实验中,我们获得了更准确的离策略估计,从而克服了后处理排序带来的问题。据我们所知,这是首次针对现实世界业务规则对排序学习模型离线评估影响的研究。