Training reinforcement learning-based recommender systems are often hindered by the lack of dynamic and realistic user interactions. Lusifer, a novel environment leveraging Large Language Models (LLMs), addresses this limitation by generating simulated user feedback. It synthesizes user profiles and interaction histories to simulate responses and behaviors toward recommended items. In addition, user profiles are updated after each rating to reflect evolving user characteristics. Using the MovieLens100K dataset as proof of concept, Lusifer demonstrates accurate emulation of user behavior and preferences. This paper presents Lusifer's operational pipeline, including prompt generation and iterative user profile updates. While validating Lusifer's ability to produce realistic dynamic feedback, future research could utilize this environment to train reinforcement learning systems, offering a scalable and adjustable framework for user simulation in online recommender systems.
翻译:基于强化学习的推荐系统训练常因缺乏动态且真实的用户交互而受阻。Lusifer作为一种利用大语言模型的新型环境,通过生成模拟用户反馈来解决这一局限。该环境综合用户画像与交互历史,模拟用户对推荐项目的响应与行为。此外,每次评分后用户画像会同步更新以反映动态变化的用户特征。以MovieLens100K数据集作为概念验证,Lusifer展现出对用户行为与偏好的精准模拟能力。本文阐述了Lusifer的运行流程,包括提示生成与迭代式用户画像更新机制。在验证Lusifer生成真实动态反馈能力的同时,未来研究可借助该环境训练强化学习系统,为在线推荐系统的用户模拟提供可扩展且可调节的框架。