User interaction data is an important source of supervision in counterfactual learning to rank (CLTR). Such data suffers from presentation bias. Much work in unbiased learning to rank (ULTR) focuses on position bias, i.e., items at higher ranks are more likely to be examined and clicked. Inter-item dependencies also influence examination probabilities, with outlier items in a ranking as an important example. Outliers are defined as items that observably deviate from the rest and therefore stand out in the ranking. In this paper, we identify and introduce the bias brought about by outlier items: users tend to click more on outlier items and their close neighbors. To this end, we first conduct a controlled experiment to study the effect of outliers on user clicks. Next, to examine whether the findings from our controlled experiment generalize to naturalistic situations, we explore real-world click logs from an e-commerce platform. We show that, in both scenarios, users tend to click significantly more on outlier items than on non-outlier items in the same rankings. We show that this tendency holds for all positions, i.e., for any specific position, an item receives more interactions when presented as an outlier as opposed to a non-outlier item. We conclude from our analysis that the effect of outliers on clicks is a type of bias that should be addressed in ULTR. We therefore propose an outlier-aware click model that accounts for both outlier and position bias, called outlier-aware position-based model ( OPBM). We estimate click propensities based on OPBM ; through extensive experiments performed on both real-world e-commerce data and semi-synthetic data, we verify the effectiveness of our outlier-aware click model. Our results show the superiority of OPBM against baselines in terms of ranking performance and true relevance estimation.
翻译:用户交互数据是对抗性学习排序(CLTR)中重要的监督信息源,但此类数据存在展示偏差。现有无偏学习排序(ULTR)研究主要关注位置偏差——即排名靠前的项目更易被审视和点击。本文指出,项目间依赖性同样影响审视概率,其中排序中的离群项是关键案例。离群项定义为与其余项目存在明显差异从而在排序中突出的项目。本研究首次识别并引入离群项引发的偏差:用户倾向于更多点击离群项及其邻近项目。为此,我们首先通过控制实验研究离群项对用户点击的影响,继而通过分析电商平台实际点击日志验证控制实验结论的普适性。实验表明,在两种场景下,用户对同一排序中离群项的点击量均显著高于非离群项。这一规律适用于所有位置,即对于任意特定位置,当项目以离群形式呈现时获得的交互量均高于非离群形式。基于分析结论,我们认为离群项对用户点击的影响属于ULTR领域需要解决的偏差类型,因此提出兼顾离群偏差与位置偏差的离群感知点击模型——离群感知位置模型(OPBM)。我们基于OPBM估算点击倾向性,并在真实电商数据与半合成数据上开展大量实验,验证了该离群感知点击模型的有效性。结果表明,OPBM在排序性能与真实相关性估计方面均优于基线方法。