We introduce Probabilistic Rank and Reward (PRR), a scalable probabilistic model for personalized slate recommendation. Our approach allows off-policy estimation of the reward in the scenario where the user interacts with at most one item from a slate of K items. We show that the probability of a slate being successful can be learned efficiently by combining the reward, whether the user successfully interacted with the slate, and the rank, the item that was selected within the slate. PRR outperforms existing off-policy reward optimizing methods and is far more scalable to large action spaces. Moreover, PRR allows fast delivery of recommendations powered by maximum inner product search (MIPS), making it suitable in low latency domains such as computational advertising.
翻译:本文提出概率排序与奖励(PRR),一种用于个性化推荐列表的可扩展概率模型。我们的方法支持在用户最多与推荐列表中K个项目中的一个进行交互的场景下进行离策略奖励估计。我们证明,通过结合奖励(用户是否成功与列表交互)与排序(列表中具体被选中的项目),可以高效学习列表成功的概率。PRR在性能上优于现有的离策略奖励优化方法,且对大规模动作空间具有显著更高的可扩展性。此外,PRR借助最大内积搜索(MIPS)实现快速推荐投放,使其适用于计算广告等低延迟领域。