Generally speaking, the model training for recommender systems can be based on two types of data, namely explicit feedback and implicit feedback. Moreover, because of its general availability, we see wide adoption of implicit feedback data, such as click signal. There are mainly two challenges for the application of implicit feedback. First, implicit data just includes positive feedback. Therefore, we are not sure whether the non-interacted items are really negative or positive but not displayed to the corresponding user. Moreover, the relevance of rare items is usually underestimated since much fewer positive feedback of rare items is collected compared with popular ones. To tackle such difficulties, both pointwise and pairwise solutions are proposed before for unbiased relevance learning. As pairwise learning suits well for the ranking tasks, the previously proposed unbiased pairwise learning algorithm already achieves state-of-the-art performance. Nonetheless, the existing unbiased pairwise learning method suffers from high variance. To get satisfactory performance, non-negative estimator is utilized for practical variance control but introduces additional bias. In this work, we propose an unbiased pairwise learning method, named UPL, with much lower variance to learn a truly unbiased recommender model. Extensive offline experiments on real world datasets and online A/B testing demonstrate the superior performance of our proposed method.
翻译:一般而言,推荐系统的模型训练可基于两类数据:显式反馈与隐式反馈。由于隐式反馈数据(如点击信号)的普遍易得性,其被广泛采用。隐式反馈的应用面临两大挑战:首先,隐式数据仅包含正反馈,因此无法确定未交互项是否确实为负样本,抑或为正样本但未向对应用户展示;其次,稀有项的相关性通常被低估,因为与热门项相比,稀有项收集到的正反馈数量少得多。为解决这些难题,此前已提出基于逐点与成对学习的无偏相关性学习方法。由于成对学习更适合排序任务,此前提出的无偏成对学习算法已取得最优性能。然而,现有无偏成对学习方法存在高方差问题。为获得满意性能,实践采用非负估计器进行方差控制,但这引入了额外偏差。本文提出一种名为UPL的无偏成对学习方法,其方差显著降低,可学习真正无偏的推荐模型。在真实数据集上的大量离线实验及在线A/B测试表明,所提方法具有优越性能。