Evaluation of policies in recommender systems typically involves A/B testing using live experiments on real users to assess a new policy's impact on relevant metrics. This ``gold standard'' comes at a high cost, however, in terms of cycle time, user cost, and potential user retention. In developing policies for ``onboarding'' new users, these costs can be especially problematic, since on-boarding occurs only once. In this work, we describe a simulation methodology used to augment (and reduce) the use of live experiments. We illustrate its deployment for the evaluation of ``preference elicitation'' algorithms used to onboard new users of the YouTube Music platform. By developing counterfactually robust user behavior models, and a simulation service that couples such models with production infrastructure, we are able to test new algorithms in a way that reliably predicts their performance on key metrics when deployed live. We describe our domain, our simulation models and platform, results of experiments and deployment, and suggest future steps needed to further realistic simulation as a powerful complement to live experiments.
翻译:推荐系统中的策略评估通常涉及使用真实用户进行A/B测试,以衡量新策略对相关指标的影响。然而,这种"黄金标准"在周期时间、用户成本和潜在用户留存方面代价高昂。在为新用户"引导"开发策略时,这些成本尤其突出,因为引导过程仅发生一次。本研究提出一种仿真方法,用于增强(并减少)实时实验的使用。我们以YouTube Music平台新用户引导中使用的"偏好引导"算法评估为例,展示了该方法的部署过程。通过构建反事实鲁棒的用户行为模型,以及将此类模型与生产基础设施耦合的仿真服务平台,我们能够以可靠预测关键指标现场表现的方式测试新算法。本文阐述了我们的研究领域、仿真模型与平台架构、实验与部署结果,并提出了推动仿真技术向更真实化发展、使其成为实时实验有力补充的未来研究方向。