Ranking items regarding individual user interests is a core technique of multiple downstream tasks such as recommender systems. Learning such a personalized ranker typically relies on the implicit feedback from users' past click-through behaviors. However, collected feedback is biased toward previously highly-ranked items and directly learning from it would result in a "rich-get-richer" phenomenon. In this paper, we propose a simple yet sufficient unbiased learning-to-rank paradigm named InfoRank that aims to simultaneously address both position and popularity biases. We begin by consolidating the impacts of those biases into a single observation factor, thereby providing a unified approach to addressing bias-related issues. Subsequently, we minimize the mutual information between the observation estimation and the relevance estimation conditioned on the input features. By doing so, our relevance estimation can be proved to be free of bias. To implement InfoRank, we first incorporate an attention mechanism to capture latent correlations within user-item features, thereby generating estimations of observation and relevance. We then introduce a regularization term, grounded in conditional mutual information, to promote conditional independence between relevance estimation and observation estimation. Experimental evaluations conducted across three extensive recommendation and search datasets reveal that InfoRank learns more precise and unbiased ranking strategies.
翻译:针对用户个性化兴趣进行物品排序是推荐系统等多项下游任务的核心技术。学习此类个性化排序模型通常依赖用户历史点击行为的隐式反馈。然而,收集到的反馈存在偏向于先前高排序物品的偏差,直接从中学习会导致"富者愈富"现象。本文提出一种简洁且完备的无偏学习排序范式InfoRank,旨在同时解决位置偏差和流行度偏差。我们首先将这些偏差的影响整合为单一观测因子,从而提供解决偏差相关问题的统一方案。随后,以输入特征为条件,最小化观测估计与相关性估计之间的互信息。通过此方法,我们的相关性估计可被证明无偏。为实现InfoRank,我们首先引入注意力机制捕获用户-物品特征中的潜在相关性,从而生成观测与相关性的估计。接着,引入基于条件互信息的正则化项,促进相关性估计与观测估计的条件独立性。在三个大规模推荐与搜索数据集上的实验评估表明,InfoRank能够学习到更精确且无偏的排序策略。