Recently, short video platforms have achieved rapid user growth by recommending interesting content to users. The objective of the recommendation is to optimize user retention, thereby driving the growth of DAU (Daily Active Users). Retention is a long-term feedback after multiple interactions of users and the system, and it is hard to decompose retention reward to each item or a list of items. Thus traditional point-wise and list-wise models are not able to optimize retention. In this paper, we choose reinforcement learning methods to optimize the retention as they are designed to maximize the long-term performance. We formulate the problem as an infinite-horizon request-based Markov Decision Process, and our objective is to minimize the accumulated time interval of multiple sessions, which is equal to improving the app open frequency and user retention. However, current reinforcement learning algorithms can not be directly applied in this setting due to uncertainty, bias, and long delay time incurred by the properties of user retention. We propose a novel method, dubbed RLUR, to address the aforementioned challenges. Both offline and live experiments show that RLUR can significantly improve user retention. RLUR has been fully launched in Kuaishou app for a long time, and achieves consistent performance improvement on user retention and DAU.
翻译:近期,短视频平台通过向用户推荐有趣内容实现了用户快速增长。推荐系统的目标在于优化用户留存,从而驱动日活跃用户数的增长。留存是用户与系统经过多次交互后的长期反馈,且难以将留存奖励分解到每个物品或物品列表。因此,传统的逐点模型与列表模型无法优化留存。本文选择强化学习方法优化留存,因其设计初衷即在于最大化长期性能。我们将该问题建模为基于无限时域请求的马尔可夫决策过程,目标是最小化多个会话的累计时间间隔,这等价于提升应用打开频率与用户留存。然而,由于用户留存特性带来的不确定性、偏差及长延迟问题,现有强化学习算法无法直接应用于此场景。我们提出一种名为RLUR的新方法以应对上述挑战。离线实验与在线实验均表明,RLUR能显著提升用户留存。RLUR已在快手应用中长期全面部署,并在用户留存与日活跃用户数上持续取得一致性的性能提升。