We propose a domain-adapted reward model that works alongside an Offline A/B testing system for evaluating ranking models. This approach effectively measures reward for ranking model changes in large-scale Ads recommender systems, where model-free methods like IPS are not feasible. Our experiments demonstrate that the proposed technique outperforms both the vanilla IPS method and approaches using non-generalized reward models.
翻译:我们提出了一种与离线A/B测试系统协同工作的域适应奖励模型,用于评估排序模型。该方法能有效衡量大规模广告推荐系统中排序模型变更带来的收益,其中像逆倾向评分(IPS)这类无模型方法并不可行。实验结果表明,所提出的技术优于原始IPS方法以及使用非泛化奖励模型的方法。