Traditional recommendation setting tends to excessively cater to users' immediate interests and neglect their long-term engagement. To address it, it is crucial to incorporate planning capabilities into the recommendation decision-making process to develop policies that take into account both immediate interests and long-term engagement. Despite Reinforcement Learning (RL) can learn planning capacity by maximizing cumulative reward, the scarcity of recommendation data presents challenges such as instability and susceptibility to overfitting when training RL models from scratch. In this context, we propose to leverage the remarkable planning capabilities over sparse data of Large Language Models (LLMs) for long-term recommendation. The key lies in enabling a language model to understand and apply task-solving principles effectively in personalized recommendation scenarios, as the model's pre-training may not naturally encompass these principles, necessitating the need to inspire or teach the model. To achieve this, we propose a Bi-level Learnable LLM Planner framework, which combines macro-learning and micro-learning through a hierarchical mechanism. The framework includes a Planner and Reflector for acquiring high-level guiding principles and an Actor-Critic component for planning personalization. Extensive experiments validate the superiority of the framework in learning to plan for long-term recommendations.
翻译:传统推荐设置往往过度迎合用户的即时兴趣,而忽视其长期参与。为解决这一问题,关键是将规划能力融入推荐决策过程,以制定兼顾即时兴趣与长期参与的策略。尽管强化学习可通过最大化累积奖励学习规划能力,但推荐数据的稀缺性导致从头训练强化学习模型时面临不稳定性和易过拟合等挑战。在此背景下,我们提出利用大语言模型在稀疏数据上的卓越规划能力实现长期推荐。其核心在于使语言模型能够理解并在个性化推荐场景中有效应用问题解决原则——由于模型预训练过程可能未天然包含这些原则,因此需要激发或教导模型掌握这些能力。为此,我们提出双层可学习大语言模型规划器框架,通过层级机制融合宏观学习与微观学习。该框架包含用于获取高层指导原则的规划器与反射器,以及用于实现规划个性化的演员-评论员组件。大量实验验证了该框架在学习规划长期推荐任务中的优越性。