An ultimate goal of recommender systems is to improve user engagement. Reinforcement learning (RL) is a promising paradigm for this goal, as it directly optimizes overall performance of sequential recommendation. However, many existing RL-based approaches induce huge computational overhead, because they require not only the recommended items but also all other candidate items to be stored. This paper proposes an efficient alternative that does not require the candidate items. The idea is to model the correlation between user engagement and items directly from data. Moreover, the proposed approach consider randomness in user feedback and termination behavior, which are ubiquitous for RS but rarely discussed in RL-based prior work. With online A/B experiments on real-world RS, we confirm the efficacy of the proposed approach and the importance of modeling the two types of randomness.
翻译:推荐系统的终极目标是提升用户参与度。强化学习(RL)因其能直接优化序列推荐的整体性能而成为实现该目标的有前景范式。然而,现有基于强化学习的方法往往需要同时存储推荐项和所有候选项目,导致计算开销巨大。本文提出一种无需候选项目的高效替代方案,其核心思想是直接从数据中建模用户参与度与项目的关联性。此外,该方法还考虑了用户反馈和终止行为的随机性——这在推荐系统中普遍存在,但以往基于强化学习的研究鲜少探讨。通过在真实推荐系统上的在线A/B实验,我们验证了所提方法的有效性及建模两种随机性的重要性。