Reinforcement Learning (RL)-based recommender systems (RSs) have garnered considerable attention due to their ability to learn optimal recommendation policies and maximize long-term user rewards. However, deploying RL models directly in online environments and generating authentic data through A/B tests can pose challenges and require substantial resources. Simulators offer an alternative approach by providing training and evaluation environments for RS models, reducing reliance on real-world data. Existing simulators have shown promising results but also have limitations such as simplified user feedback, lacking consistency with real-world data, the challenge of simulator evaluation, and difficulties in migration and expansion across RSs. To address these challenges, we propose KuaiSim, a comprehensive user environment that provides user feedback with multi-behavior and cross-session responses. The resulting simulator can support three levels of recommendation problems: the request level list-wise recommendation task, the whole-session level sequential recommendation task, and the cross-session level retention optimization task. For each task, KuaiSim also provides evaluation protocols and baseline recommendation algorithms that further serve as benchmarks for future research. We also restructure existing competitive simulators on the KuaiRand Dataset and compare them against KuaiSim to future assess their performance and behavioral differences. Furthermore, to showcase KuaiSim's flexibility in accommodating different datasets, we demonstrate its versatility and robustness when deploying it on the ML-1m dataset.
翻译:基于强化学习的推荐系统因其能够学习最优推荐策略并最大化长期用户奖励而备受关注。然而,将强化学习模型直接部署于在线环境并通过A/B测试生成真实数据既具挑战性又需大量资源。模拟器通过为推荐模型提供训练与评估环境、减少对真实世界数据的依赖,提供了一种替代方案。现有模拟器虽展现出可喜成果,但仍存在用户反馈过于简化、与真实数据一致性不足、模拟器评估困难,以及跨推荐系统的可迁移与可扩展性受限等问题。针对上述挑战,我们提出KuaiSim——一种提供多行为、跨会话响应的综合用户环境。该模拟器可支持三个层次的推荐问题:请求级列表式推荐任务、完整会话级序列推荐任务,以及跨会话级留存优化任务。针对每项任务,KuaiSim还提供评估协议与基准推荐算法,以作为未来研究的参照基准。我们还在KuaiRand数据集上重构了现有竞争性模拟器,并与KuaiSim进行对比,进一步评估其性能与行为差异。此外,为展示KuaiSim适配不同数据集的灵活性,我们通过将其部署于ML-1m数据集,验证了其多功能性与鲁棒性。