Reinforcement learning (RL) has gained popularity in the realm of recommender systems due to its ability to optimize long-term rewards and guide users in discovering relevant content. However, the successful implementation of RL in recommender systems is challenging because of several factors, including the limited availability of online data for training on-policy methods. This scarcity requires expensive human interaction for online model training. Furthermore, the development of effective evaluation frameworks that accurately reflect the quality of models remains a fundamental challenge in recommender systems. To address these challenges, we propose a comprehensive framework for synthetic environments that simulate human behavior by harnessing the capabilities of large language models (LLMs). We complement our framework with in-depth ablation studies and demonstrate its effectiveness with experiments on movie and book recommendations. Using LLMs as synthetic users, this work introduces a modular and novel framework to train RL-based recommender systems. The software, including the RL environment, is publicly available on GitHub.
翻译:强化学习(RL)因其能够优化长期奖励并引导用户发现相关内容,在推荐系统领域日益受到关注。然而,由于多种因素,在推荐系统中成功实施RL具有挑战性,其中包括可用于训练在线策略方法的在线数据有限。这种数据稀缺性使得在线模型训练需要昂贵的人工交互。此外,开发能够准确反映模型质量的有效评估框架,仍然是推荐系统中的一个根本性挑战。为应对这些挑战,我们提出了一个综合性的合成环境框架,通过利用大型语言模型(LLM)的能力来模拟人类行为。我们通过深入的消融研究来补充该框架,并在电影和图书推荐实验中验证其有效性。本研究以LLM作为合成用户,引入了一个模块化且新颖的框架,用于训练基于RL的推荐系统。相关软件,包括RL环境,已在GitHub上公开提供。