Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes Approach

Ranking is at the core of many artificial intelligence (AI) applications, including search engines, recommender systems, etc. Modern ranking systems are often constructed with learning-to-rank (LTR) models built from user behavior signals. While previous studies have demonstrated the effectiveness of using user behavior signals (e.g., clicks) as both features and labels of LTR algorithms, we argue that existing LTR algorithms that indiscriminately treat behavior and non-behavior signals in input features could lead to suboptimal performance in practice. Particularly because user behavior signals often have strong correlations with the ranking objective and can only be collected on items that have already been shown to users, directly using behavior signals in LTR could create an exploitation bias that hurts the system performance in the long run. To address the exploitation bias, we propose EBRank, an empirical Bayes-based uncertainty-aware ranking algorithm. Specifically, to overcome exploitation bias brought by behavior features in ranking models, EBRank uses a sole non-behavior feature based prior model to get a prior estimation of relevance. In the dynamic training and serving of ranking systems, EBRank uses the observed user behaviors to update posterior relevance estimation instead of concatenating behaviors as features in ranking models. Besides, EBRank additionally applies an uncertainty-aware exploration strategy to explore actively, collect user behaviors for empirical Bayesian modeling and improve ranking performance. Experiments on three public datasets show that EBRank is effective, practical and significantly outperforms state-of-the-art ranking algorithms.

翻译：排序是许多人工智能应用的核心，包括搜索引擎、推荐系统等。现代排序系统通常基于用户行为信号构建的学习排序模型。尽管已有研究证明了将用户行为信号（例如点击）作为学习排序算法的特征和标签的有效性，但我们认为，现有学习排序算法对输入特征中的行为信号和非行为信号不加区分地处理，可能导致实际应用中的次优性能。特别是由于用户行为信号通常与排序目标有强相关性，并且仅能针对已展示给用户的物品进行收集，直接使用行为信号进行学习排序会产生利用偏差，长期来看会损害系统性能。为了解决利用偏差问题，我们提出了EBRank——一种基于经验贝叶斯的不确定性感知排序算法。具体而言，为克服排序模型中行为特征带来的利用偏差，EBRank使用基于非行为特征的先验模型获取相关性的先验估计。在排序系统的动态训练和服务过程中，EBRank利用观察到的用户行为更新后验相关性估计，而非将行为作为特征拼接进排序模型。此外，EBRank还采用不确定性感知的探索策略进行主动探索，收集用户行为用于经验贝叶斯建模，从而提升排序性能。在三个公开数据集上的实验表明，EBRank高效实用，显著优于当前最先进的排序算法。