Model-based Unbiased Learning to Rank

Unbiased Learning to Rank (ULTR) that learns to rank documents with biased user feedback data is a well-known challenge in information retrieval. Existing methods in unbiased learning to rank typically rely on click modeling or inverse propensity weighting (IPW). Unfortunately, the search engines are faced with severe long-tail query distribution, where neither click modeling nor IPW can handle well. Click modeling suffers from data sparsity problem since the same query-document pair appears limited times on tail queries; IPW suffers from high variance problem since it is highly sensitive to small propensity score values. Therefore, a general debiasing framework that works well under tail queries is in desperate need. To address this problem, we propose a model-based unbiased learning-to-rank framework. Specifically, we develop a general context-aware user simulator to generate pseudo clicks for unobserved ranked lists to train rankers, which addresses the data sparsity problem. In addition, considering the discrepancy between pseudo clicks and actual clicks, we take the observation of a ranked list as the treatment variable and further incorporate inverse propensity weighting with pseudo labels in a doubly robust way. The derived bias and variance indicate that the proposed model-based method is more robust than existing methods. Finally, extensive experiments on benchmark datasets, including simulated datasets and real click logs, demonstrate that the proposed model-based method consistently performs outperforms state-of-the-art methods in various scenarios. The code is available at https://github.com/rowedenny/MULTR.

翻译：无偏学习排序（ULTR）旨在利用带有偏见的用户反馈数据学习文档排序，是信息检索领域的一个公认难题。现有的无偏学习排序方法通常依赖于点击建模或逆倾向加权（IPW）。然而，搜索引擎面临着严重的长尾查询分布问题，点击建模和IPW均难以有效应对。对于长尾查询，由于同一查询-文档对出现次数有限，点击建模面临数据稀疏问题；而IPW对较小的倾向得分值高度敏感，因此存在高方差问题。因此，迫切需要一种能够在长尾查询下有效工作的通用去偏框架。为解决这一问题，我们提出了一种基于模型的无偏学习排序框架。具体而言，我们开发了一个通用的上下文感知用户模拟器，用于为未观测的排序列表生成伪点击以训练排序器，从而解决了数据稀疏问题。此外，考虑到伪点击与实际点击之间的差异，我们将排序列表的观测视为处理变量，并以双重稳健的方式将逆倾向加权与伪标签相结合。推导出的偏差和方差表明，所提出的基于模型的方法比现有方法更稳健。最后，在基准数据集（包括模拟数据集和真实点击日志）上进行的大量实验证明，该基于模型的方法在各种场景下始终优于最先进的方法。代码已开源在 https://github.com/rowedenny/MULTR。