Reinforcement Learning (RL)-based recommender systems (RSs) have garnered considerable attention due to their ability to learn optimal recommendation policies and maximize long-term user rewards. However, deploying RL models directly in online environments and generating authentic data through A/B tests can pose challenges and require substantial resources. Simulators offer an alternative approach by providing training and evaluation environments for RS models, reducing reliance on real-world data. Existing simulators have shown promising results but also have limitations such as simplified user feedback, lacking consistency with real-world data, the challenge of simulator evaluation, and difficulties in migration and expansion across RSs. To address these challenges, we propose KuaiSim, a comprehensive user environment that provides user feedback with multi-behavior and cross-session responses. The resulting simulator can support three levels of recommendation problems: the request level list-wise recommendation task, the whole-session level sequential recommendation task, and the cross-session level retention optimization task. For each task, KuaiSim also provides evaluation protocols and baseline recommendation algorithms that further serve as benchmarks for future research. We also restructure existing competitive simulators on the KuaiRand Dataset and compare them against KuaiSim to future assess their performance and behavioral differences. Furthermore, to showcase KuaiSim's flexibility in accommodating different datasets, we demonstrate its versatility and robustness when deploying it on the ML-1m dataset.
翻译:基于强化学习的推荐系统因其能够学习最优推荐策略并最大化长期用户收益而备受关注。然而,直接在在线环境中部署强化学习模型并通过A/B测试生成真实数据面临挑战且需耗费大量资源。仿真器通过为推荐系统模型提供训练与评估环境,为减少对真实数据的依赖提供了替代方案。现有仿真器虽取得显著成果,但仍存在用户反馈过于简化、与真实数据一致性不足、仿真器评估困难以及跨推荐系统迁移扩展能力受限等问题。针对这些挑战,我们提出了KuaiSim——一种提供多行为与跨会话响应的综合用户环境。该仿真器可支持三个层次的推荐问题:请求级列表推荐任务、全会话级序列推荐任务及跨会话级留存优化任务。针对每项任务,KuaiSim同时提供评估协议与基线推荐算法,为后续研究构建基准。我们在KuaiRand数据集上重构了现有竞争性仿真器并与KuaiSim进行对比,以进一步评估其性能与行为差异。此外,为展示KuaiSim对不同数据集的适应能力,我们通过在ML-1m数据集上的部署验证了其灵活性与鲁棒性。