Reinforcement Learning-based Recommender Systems with Large Language Models for State Reward and Action Modeling

Reinforcement Learning (RL)-based recommender systems have demonstrated promising performance in meeting user expectations by learning to make accurate next-item recommendations from historical user-item interactions. However, existing offline RL-based sequential recommendation methods face the challenge of obtaining effective user feedback from the environment. Effectively modeling the user state and shaping an appropriate reward for recommendation remains a challenge. In this paper, we leverage language understanding capabilities and adapt large language models (LLMs) as an environment (LE) to enhance RL-based recommenders. The LE is learned from a subset of user-item interaction data, thus reducing the need for large training data, and can synthesise user feedback for offline data by: (i) acting as a state model that produces high quality states that enrich the user representation, and (ii) functioning as a reward model to accurately capture nuanced user preferences on actions. Moreover, the LE allows to generate positive actions that augment the limited offline training data. We propose a LE Augmentation (LEA) method to further improve recommendation performance by optimising jointly the supervised component and the RL policy, using the augmented actions and historical user signals. We use LEA, the state and reward models in conjunction with state-of-the-art RL recommenders and report experimental results on two publicly available datasets.

翻译：基于强化学习的推荐系统通过从历史用户-物品交互中学习如何准确推荐下一项物品，在满足用户期望方面展现出良好性能。然而，现有离线强化学习序列推荐方法面临从环境中获取有效用户反馈的挑战。如何有效建模用户状态并构建合适的推荐奖励机制仍是一项难题。本文利用语言理解能力，将大型语言模型作为环境模块来增强强化学习推荐系统。该环境模块通过用户-物品交互数据子集进行学习，从而减少对大规模训练数据的依赖，并能通过以下方式为离线数据合成用户反馈：（i）作为状态模型生成高质量状态以丰富用户表征；（ii）作为奖励模型精准捕捉用户对动作的细微偏好。此外，该环境模块还能生成正向动作以扩充有限的离线训练数据。我们提出环境模块增强方法，通过联合优化监督组件和强化学习策略，利用增强动作与历史用户信号进一步提升推荐性能。我们将环境模块、状态模型与奖励模型同最先进的强化学习推荐器结合使用，并在两个公开数据集上报告实验结果。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日