Interactive recommender systems can dynamically adapt to user feedback, but often suffer from content homogeneity and filter bubble effects due to overfitting short-term user preferences. While recent efforts aim to improve content diversity, they predominantly operate in static or one-shot settings, neglecting the long-term evolution of user interests. Reinforcement learning provides a principled framework for optimizing long-term user satisfaction by modeling sequential decision-making processes. However, its application in recommendation is hindered by sparse, long-tailed user-item interactions and limited semantic planning capabilities. In this work, we propose LLM-Enhanced Reinforcement Learning (LERL), a novel hierarchical recommendation framework that integrates the semantic planning power of LLM with the fine-grained adaptability of RL. LERL consists of a high-level LLM-based planner that selects semantically diverse content categories, and a low-level RL policy that recommends personalized items within the selected semantic space. This hierarchical design narrows the action space, enhances planning efficiency, and mitigates overexposure to redundant content. Extensive experiments on real-world datasets demonstrate that LERL significantly improves long-term user satisfaction when compared with state-of-the-art baselines. The implementation of LERL is available at https://github.com/1163710212/LERL.
翻译:交互式推荐系统能够根据用户反馈动态调整,但由于过度拟合短期用户偏好,常面临内容同质化和信息茧房效应。尽管近期研究致力于提升内容多样性,但大多局限于静态或单次推荐场景,忽略了用户兴趣的长期演化过程。强化学习通过建模序列化决策过程,为优化长期用户满意度提供了理论框架,但其在推荐领域的应用受到稀疏、长尾的用户-物品交互以及有限语义规划能力的制约。本文提出LLM增强的强化学习框架,这是一种创新的分层推荐架构,融合了LLM的语义规划能力与RL的细粒度适应性。该框架包含高层LLM规划器(负责选择语义多样化的内容类别)和底层RL策略(在选定语义空间内推荐个性化物品)。这种分层设计通过压缩动作空间提升规划效率,并缓解冗余内容的过度曝光问题。基于真实数据集的实验表明,相较于现有先进基线方法,本框架能显著提升长期用户满意度。项目代码已开源:https://github.com/1163710212/LERL。