最小化推荐系统中的实时实验：基于用户仿真的偏好引导策略评估 (Minimizing Live Experiments in Recommender Systems: User Simulation to Evaluate Preference Elicitation Policies)

Evaluation of policies in recommender systems typically involves A/B testing using live experiments on real users to assess a new policy's impact on relevant metrics. This ``gold standard'' comes at a high cost, however, in terms of cycle time, user cost, and potential user retention. In developing policies for ``onboarding'' new users, these costs can be especially problematic, since on-boarding occurs only once. In this work, we describe a simulation methodology used to augment (and reduce) the use of live experiments. We illustrate its deployment for the evaluation of ``preference elicitation'' algorithms used to onboard new users of the YouTube Music platform. By developing counterfactually robust user behavior models, and a simulation service that couples such models with production infrastructure, we are able to test new algorithms in a way that reliably predicts their performance on key metrics when deployed live. We describe our domain, our simulation models and platform, results of experiments and deployment, and suggest future steps needed to further realistic simulation as a powerful complement to live experiments.

翻译：推荐系统中的策略评估通常涉及使用真实用户进行A/B测试，以衡量新策略对相关指标的影响。然而，这种"黄金标准"在周期时间、用户成本和潜在用户留存方面代价高昂。在为新用户"引导"开发策略时，这些成本尤其突出，因为引导过程仅发生一次。本研究提出一种仿真方法，用于增强（并减少）实时实验的使用。我们以YouTube Music平台新用户引导中使用的"偏好引导"算法评估为例，展示了该方法的部署过程。通过构建反事实鲁棒的用户行为模型，以及将此类模型与生产基础设施耦合的仿真服务平台，我们能够以可靠预测关键指标现场表现的方式测试新算法。本文阐述了我们的研究领域、仿真模型与平台架构、实验与部署结果，并提出了推动仿真技术向更真实化发展、使其成为实时实验有力补充的未来研究方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/