Reinforcement learning-based recommender systems have recently gained popularity. However, the design of the reward function, on which the agent relies to optimize its recommendation policy, is often not straightforward. Exploring the causality underlying users' behavior can take the place of the reward function in guiding the agent to capture the dynamic interests of users. Moreover, due to the typical limitations of simulation environments (e.g., data inefficiency), most of the work cannot be broadly applied in large-scale situations. Although some works attempt to convert the offline dataset into a simulator, data inefficiency makes the learning process even slower. Because of the nature of reinforcement learning (i.e., learning by interaction), it cannot collect enough data to train during a single interaction. Furthermore, traditional reinforcement learning algorithms do not have a solid capability like supervised learning methods to learn from offline datasets directly. In this paper, we propose a new model named the causal decision transformer for recommender systems (CDT4Rec). CDT4Rec is an offline reinforcement learning system that can learn from a dataset rather than from online interaction. Moreover, CDT4Rec employs the transformer architecture, which is capable of processing large offline datasets and capturing both short-term and long-term dependencies within the data to estimate the causal relationship between action, state, and reward. To demonstrate the feasibility and superiority of our model, we have conducted experiments on six real-world offline datasets and one online simulator.
翻译:基于强化学习的推荐系统近年来受到广泛关注。然而,智能体依赖其优化推荐策略的奖励函数设计往往并非易事。探索用户行为背后的因果性可以取代奖励函数,引导智能体捕捉用户的动态兴趣。此外,由于模拟环境典型的数据效率低下等限制,多数工作难以广泛应用于大规模场景。尽管部分研究尝试将离线数据集转化为模拟器,但数据效率问题反而减缓了学习进程。受强化学习自身特质(即通过交互进行学习)所限,单次交互中无法收集足够数据进行训练。并且,传统强化学习算法无法像监督学习方法那样具备直接从离线数据集学习的稳健能力。本文提出名为推荐系统因果决策转换器(CDT4Rec)的新模型。CDT4Rec是一种离线强化学习系统,能从数据集中而非在线交互中学习。同时,CDT4Rec采用Transformer架构,可处理大规模离线数据集并捕获数据中的短期与长期依赖关系,从而估计动作、状态与奖励之间的因果关系。为验证模型可行性与优越性,我们在六个真实世界离线数据集和一个在线模拟器上进行了实验。