Uncertainty Quantification for Fairness in Two-Stage Recommender Systems

Many large-scale recommender systems consist of two stages. The first stage efficiently screens the complete pool of items for a small subset of promising candidates, from which the second-stage model curates the final recommendations. In this paper, we investigate how to ensure group fairness to the items in this two-stage architecture. In particular, we find that existing first-stage recommenders might select an irrecoverably unfair set of candidates such that there is no hope for the second-stage recommender to deliver fair recommendations. To this end, motivated by recent advances in uncertainty quantification, we propose two threshold-policy selection rules that can provide distribution-free and finite-sample guarantees on fairness in first-stage recommenders. More concretely, given any relevance model of queries and items and a point-wise lower confidence bound on the expected number of relevant items for each threshold-policy, the two rules find near-optimal sets of candidates that contain enough relevant items in expectation from each group of items. To instantiate the rules, we demonstrate how to derive such confidence bounds from potentially partial and biased user feedback data, which are abundant in many large-scale recommender systems. In addition, we provide both finite-sample and asymptotic analyses of how close the two threshold selection rules are to the optimal thresholds. Beyond this theoretical analysis, we show empirically that these two rules can consistently select enough relevant items from each group while minimizing the size of the candidate sets for a wide range of settings.

翻译：许多大规模推荐系统包含两个阶段。第一阶段高效地从完整的项目池中筛选出少量有希望的候选项目，第二阶段模型则在此基础上生成最终推荐。本文研究了如何在此两阶段架构中确保对项目的群体公平性。具体而言，我们发现现有的第一阶段推荐器可能选择一组不可挽回的不公平候选集，导致第二阶段推荐器无法实现公平推荐。为此，受近期不确定性量化研究的启发，我们提出了两种阈值策略选择规则，能够为第一阶段推荐器的公平性提供无分布假设且有限样本的保证。更具体地，给定任意查询与项目相关性模型，以及关于每个阈值策略下预期相关项目数的逐点下置信界，这两种规则能找出在期望意义上从每个项目群体中包含足够相关项目的近优候选集。为实例化这些规则，我们展示了如何从大规模推荐系统中常见的潜在有偏或不完整的用户反馈数据中推导出此类置信界。此外，我们还提供了关于这两种阈值选择规则与最优阈值接近程度的有限样本及渐近分析。除理论分析外，实验表明这两种规则能在广泛设置中从每个群体中一致地筛选出足够的相关项目，同时最小化候选集规模。