We study the problem of optimizing a recommender system for outcomes that occur over several weeks or months. We begin by drawing on reinforcement learning to formulate a comprehensive model of users' recurring relationships with a recommender system. Measurement, attribution, and coordination challenges complicate algorithm design. We describe careful modeling -- including a new representation of user state and key conditional independence assumptions -- which overcomes these challenges and leads to simple, testable recommender system prototypes. We apply our approach to a podcast recommender system that makes personalized recommendations to hundreds of millions of listeners. A/B tests demonstrate that purposefully optimizing for long-term outcomes leads to large performance gains over conventional approaches that optimize for short-term proxies.
翻译:我们研究了针对数周或数月内产生效果的推荐系统优化问题。首先借鉴强化学习理论,构建了用户与推荐系统之间重复交互关系的综合模型。测量、归因与协调方面的挑战增加了算法设计的复杂性。我们通过精心建模——包括用户状态的新表示方法及关键条件独立性假设——克服了这些困难,并开发出简洁且可测试的推荐系统原型。将该方法应用于向数亿听众提供个性化推荐的播客推荐系统中。A/B测试表明,与优化短期代理指标的常规方法相比,针对长期结果的目标性优化能带来显著的性能提升。