Planning for both immediate and long-term benefits becomes increasingly important in recommendation. Existing methods apply Reinforcement Learning (RL) to learn planning capacity by maximizing cumulative reward for long-term recommendation. However, the scarcity of recommendation data presents challenges such as instability and susceptibility to overfitting when training RL models from scratch, resulting in sub-optimal performance. In this light, we propose to leverage the remarkable planning capabilities over sparse data of Large Language Models (LLMs) for long-term recommendation. The key to achieving the target lies in formulating a guidance plan following principles of enhancing long-term engagement and grounding the plan to effective and executable actions in a personalized manner. To this end, we propose a Bi-level Learnable LLM Planner framework, which consists of a set of LLM instances and breaks down the learning process into macro-learning and micro-learning to learn macro-level guidance and micro-level personalized recommendation policies, respectively. Extensive experiments validate that the framework facilitates the planning ability of LLMs for long-term recommendation. Our code and data can be found at https://github.com/jizhi-zhang/BiLLP.
翻译:兼顾短期与长期收益的规划在推荐系统中日益重要。现有方法通过强化学习(RL)最大化累积奖励以学习长期推荐规划能力,但推荐数据稀疏性导致从头训练RL模型时存在不稳定、易过拟合等问题,进而影响模型性能。为此,本文提出利用大语言模型(LLM)在稀疏数据上的卓越规划能力实现长期推荐。其核心在于:遵循增强长期参与度的原则制定引导计划,并将该计划以个性化方式转化为有效可执行动作。基于此,我们提出双层可学习LLM规划器框架,该框架包含一组LLM实例,通过将学习过程分解为宏观学习与微观学习,分别学习宏观层面的引导策略与微观层面的个性化推荐策略。大量实验证明,该框架能够有效提升LLM在长期推荐任务中的规划能力。相关代码与数据见https://github.com/jizhi-zhang/BiLLP。