Recently, short video platforms have achieved rapid user growth by recommending interesting content to users. The objective of the recommendation is to optimize user retention, thereby driving the growth of DAU (Daily Active Users). Retention is a long-term feedback after multiple interactions of users and the system, and it is hard to decompose retention reward to each item or a list of items. Thus traditional point-wise and list-wise models are not able to optimize retention. In this paper, we choose reinforcement learning methods to optimize the retention as they are designed to maximize the long-term performance. We formulate the problem as an infinite-horizon request-based Markov Decision Process, and our objective is to minimize the accumulated time interval of multiple sessions, which is equal to improving the app open frequency and user retention. However, current reinforcement learning algorithms can not be directly applied in this setting due to uncertainty, bias, and long delay time incurred by the properties of user retention. We propose a novel method, dubbed RLUR, to address the aforementioned challenges. Both offline and live experiments show that RLUR can significantly improve user retention. RLUR has been fully launched in Kuaishou app for a long time, and achieves consistent performance improvement on user retention and DAU.
翻译:近期,短视频平台通过向用户推荐感兴趣的内容实现了用户规模的快速增长。推荐的优化目标是提升用户留存率,从而驱动日活跃用户数(DAU)的增长。留存是用户与系统多次交互后的长期反馈结果,难以将留存奖励分解至单个或列表形式的物品。因此,传统的逐点模型和列表模型无法优化留存问题。本文选择强化学习方法优化留存,因其专为最大化长期性能而设计。我们将问题建模为无限视野的请求型马尔可夫决策过程,目标是最小化多个会话的累积时间间隔,即等同于提升应用打开频率和用户留存率。然而,由于用户留存特性导致的随机性、偏差和长延迟时间,现有强化学习算法无法直接应用于该场景。我们提出一种名为RLUR的新方法以应对上述挑战。离线实验与在线实验均表明,RLUR能显著提升用户留存率。目前RLUR已在快手APP长期全面部署,并在用户留存率和DAU上持续取得性能提升。