Generally speaking, the model training for recommender systems can be based on two types of data, namely explicit feedback and implicit feedback. Moreover, because of its general availability, we see wide adoption of implicit feedback data, such as click signal. There are mainly two challenges for the application of implicit feedback. First, implicit data just includes positive feedback. Therefore, we are not sure whether the non-interacted items are really negative or positive but not displayed to the corresponding user. Moreover, the relevance of rare items is usually underestimated since much fewer positive feedback of rare items is collected compared with popular ones. To tackle such difficulties, both pointwise and pairwise solutions are proposed before for unbiased relevance learning. As pairwise learning suits well for the ranking tasks, the previously proposed unbiased pairwise learning algorithm already achieves state-of-the-art performance. Nonetheless, the existing unbiased pairwise learning method suffers from high variance. To get satisfactory performance, non-negative estimator is utilized for practical variance control but introduces additional bias. In this work, we propose an unbiased pairwise learning method, named UPL, with much lower variance to learn a truly unbiased recommender model. Extensive offline experiments on real world datasets and online A/B testing demonstrate the superior performance of our proposed method.
翻译:一般而言,推荐系统的模型训练可基于两类数据:显式反馈与隐式反馈。由于隐式反馈数据(如点击信号)的普遍可得性,其得到了广泛应用。隐式反馈的应用主要面临两大挑战:其一,隐式数据仅包含正向反馈,因此无法确定未被交互的物品究竟是真正负面的,还是正向但未向对应用户展示;其二,稀有物品的关联性常被低估,因为相较于热门物品,稀有物品收集到的正向反馈数量极少。为应对这些难题,此前已提出逐点与成对两种无偏相关性学习方法。由于成对学习更契合排序任务,此前提出的无偏成对学习算法已取得最优性能。然而,现有无偏成对学习方法存在高方差问题。为获得满意性能,实践中采用非负估计器进行方差控制,但这会引入额外偏差。本文提出一种名为UPL的低方差无偏成对学习方法,以实现真正无偏的推荐模型学习。在真实世界数据集上的离线实验及在线A/B测试均充分证明了所提方法的优越性能。