Modern recommender systems are built upon computation-intensive infrastructure, and it is challenging to perform real-time computation for each request, especially in peak periods, due to the limited computational resources. Recommending by user-wise result caches is widely used when the system cannot afford a real-time recommendation. However, it is challenging to allocate real-time and cached recommendations to maximize the users' overall engagement. This paper shows two key challenges to cache allocation, i.e., the value-strategy dependency and the streaming allocation. Then, we propose a reinforcement prediction-allocation framework (RPAF) to address these issues. RPAF is a reinforcement-learning-based two-stage framework containing prediction and allocation stages. The prediction stage estimates the values of the cache choices considering the value-strategy dependency, and the allocation stage determines the cache choices for each individual request while satisfying the global budget constraint. We show that the challenge of training RPAF includes globality and the strictness of budget constraints, and a relaxed local allocator (RLA) is proposed to address this issue. Moreover, a PoolRank algorithm is used in the allocation stage to deal with the streaming allocation problem. Experiments show that RPAF significantly improves users' engagement under computational budget constraints.
翻译:现代推荐系统构建于计算密集型基础设施之上,由于计算资源有限,尤其是在峰值时段,对每个请求进行实时计算具有挑战性。当系统无法承担实时推荐时,基于用户级结果缓存的推荐被广泛使用。然而,如何分配实时推荐与缓存推荐以最大化用户的整体参与度是一项难题。本文揭示了缓存分配面临的两个关键挑战,即价值-策略依赖性与流式分配问题。为此,我们提出了一种强化预测-分配框架(RPAF)来解决这些问题。RPAF是一个基于强化学习的双阶段框架,包含预测阶段与分配阶段。预测阶段在考虑价值-策略依赖性的前提下评估缓存选择的价值,分配阶段则在满足全局预算约束的条件下为每个独立请求确定缓存选择。我们指出训练RPAF的挑战包括全局性与预算约束的严格性,并提出一种松弛局部分配器(RLA)以应对此问题。此外,分配阶段采用PoolRank算法处理流式分配问题。实验表明,在计算预算约束下,RPAF能显著提升用户参与度。