Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning - 专知论文

会员服务 ·

0

奖励函数 · 离线强化学习 · 强化学习 · 数据集 · 推荐系统 ·

2023 年 4 月 17 日

Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning

翻译：基于离线强化学习的推荐系统因果决策转换器

Siyu Wang,Xiaocong Chen,Dietmar Jannach,Lina Yao

Reinforcement learning-based recommender systems have recently gained popularity. However, the design of the reward function, on which the agent relies to optimize its recommendation policy, is often not straightforward. Exploring the causality underlying users' behavior can take the place of the reward function in guiding the agent to capture the dynamic interests of users. Moreover, due to the typical limitations of simulation environments (e.g., data inefficiency), most of the work cannot be broadly applied in large-scale situations. Although some works attempt to convert the offline dataset into a simulator, data inefficiency makes the learning process even slower. Because of the nature of reinforcement learning (i.e., learning by interaction), it cannot collect enough data to train during a single interaction. Furthermore, traditional reinforcement learning algorithms do not have a solid capability like supervised learning methods to learn from offline datasets directly. In this paper, we propose a new model named the causal decision transformer for recommender systems (CDT4Rec). CDT4Rec is an offline reinforcement learning system that can learn from a dataset rather than from online interaction. Moreover, CDT4Rec employs the transformer architecture, which is capable of processing large offline datasets and capturing both short-term and long-term dependencies within the data to estimate the causal relationship between action, state, and reward. To demonstrate the feasibility and superiority of our model, we have conducted experiments on six real-world offline datasets and one online simulator.

翻译：基于强化学习的推荐系统近年来受到广泛关注。然而，智能体依赖其优化推荐策略的奖励函数设计往往并非易事。探索用户行为背后的因果性可以取代奖励函数，引导智能体捕捉用户的动态兴趣。此外，由于模拟环境典型的数据效率低下等限制，多数工作难以广泛应用于大规模场景。尽管部分研究尝试将离线数据集转化为模拟器，但数据效率问题反而减缓了学习进程。受强化学习自身特质（即通过交互进行学习）所限，单次交互中无法收集足够数据进行训练。并且，传统强化学习算法无法像监督学习方法那样具备直接从离线数据集学习的稳健能力。本文提出名为推荐系统因果决策转换器（CDT4Rec）的新模型。CDT4Rec是一种离线强化学习系统，能从数据集中而非在线交互中学习。同时，CDT4Rec采用Transformer架构，可处理大规模离线数据集并捕获数据中的短期与长期依赖关系，从而估计动作、状态与奖励之间的因果关系。为验证模型可行性与优越性，我们在六个真实世界离线数据集和一个在线模拟器上进行了实验。

0

相关内容

奖励函数

战术先验知识启发的多智能体双层强化学习

战术先验知识启发的多智能体双层强化学习

专知会员服务

116+阅读 · 2023年5月9日

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

专知会员服务

59+阅读 · 2022年12月10日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

24+阅读 · 2022年3月19日

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

专知会员服务

19+阅读 · 2022年3月13日

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

专知会员服务

24+阅读 · 2022年1月10日

WWW21最新「比较学习」教程，135页PPT阐述从排名数据中学习

专知会员服务

37+阅读 · 2021年4月27日

【ICLR2021】基于返回的对比表示征学习在强化学习中的应用

专知会员服务

17+阅读 · 2021年2月24日

【ICML2020】强化学习中基于模型的方法，279页ppt

【ICML2020】强化学习中基于模型的方法，279页ppt

专知会员服务

48+阅读 · 2020年10月26日

【ICML2020-DeepMind】小数据，大决策:小数据模式下的模型选择

专知会员服务

37+阅读 · 2020年9月14日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

WWW2022 | 基于因果的推荐算法教程

WWW2022 | 基于因果的推荐算法教程

机器学习与推荐算法

3+阅读 · 2022年5月26日

SIGIR2022 | 从Prompt的角度考量强化学习推荐系统

SIGIR2022 | 从Prompt的角度考量强化学习推荐系统

机器学习与推荐算法

1+阅读 · 2022年5月24日

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于强流脉冲离子束辐照评估TiH2膜的损伤机制

国家自然科学基金

0+阅读 · 2015年12月31日

基于模型检测的非确定性概率模型学习

国家自然科学基金

2+阅读 · 2013年12月31日

基于网络聊天的服务中心的建模与运作管理

国家自然科学基金

1+阅读 · 2013年12月31日

基于混合逻辑动态模型的电力电子变换器显式预测控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

随机广义方程相对于概率分布的稳定性分析及应用

国家自然科学基金

1+阅读 · 2012年12月31日

基于在线时间序列搜索的算法交易策略研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于时空语义的微博突发事件检测与短期预测研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下基于pay-as-you-go模式的服务描述与发现

国家自然科学基金

0+阅读 · 2012年12月31日

基于多Agent的混杂交互传感器网络的群集扩散同步及优势聚集效应研究

国家自然科学基金

0+阅读 · 2011年12月31日

城市大规模群体疏散模拟仿真与管理策略研究

国家自然科学基金

0+阅读 · 2009年12月31日

A Survey on Causal Reinforcement Learning

Arxiv

0+阅读 · 2023年6月1日

Deep Meta-learning in Recommendation Systems: A Survey

Arxiv

13+阅读 · 2022年6月9日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

Cold-start Sequential Recommendation via Meta Learner

Cold-start Sequential Recommendation via Meta Learner

Arxiv

15+阅读 · 2020年12月10日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

A Survey on Knowledge Graph-Based Recommender Systems

Arxiv

92+阅读 · 2020年2月28日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Causal Embeddings for Recommendation

Arxiv

23+阅读 · 2018年8月3日

Explainable Recommendation: A Survey and New Perspectives

Arxiv

11+阅读 · 2018年5月13日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

VIP会员

文章信息

相关主题

离线强化学习

最新内容

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

专知会员服务

0+阅读 · 今天15:52

《边缘端实时无线感知赋能现场多机器人部署》200页

《边缘端实时无线感知赋能现场多机器人部署》200页

专知会员服务

2+阅读 · 今天15:32

战力倍增器：自主武器系统与乌克兰及加沙冲突

战力倍增器：自主武器系统与乌克兰及加沙冲突

专知会员服务

1+阅读 · 今天15:24

人工智能赋能战场情报：提速决策进程

人工智能赋能战场情报：提速决策进程

专知会员服务

0+阅读 · 今天15:15

《拥抱新兴技术：面向未来军官的教育革新》

《拥抱新兴技术：面向未来军官的教育革新》

专知会员服务

2+阅读 · 今天15:11

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

专知会员服务

0+阅读 · 今天14:43

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

专知会员服务

0+阅读 · 今天14:40

《火线上的后勤保障：对抗环境下的随机规划模型研究——俄乌场景案例分析》99页

《火线上的后勤保障：对抗环境下的随机规划模型研究——俄乌场景案例分析》99页

专知会员服务

11+阅读 · 7月16日

《无人地面战车（UGV）的崛起》报告

《无人地面战车（UGV）的崛起》报告

专知会员服务

7+阅读 · 7月16日

《无人机参数化与集群飞行创新项目的监控流程管理：模型、策略及自适应解决方案》

《无人机参数化与集群飞行创新项目的监控流程管理：模型、策略及自适应解决方案》

专知会员服务

6+阅读 · 7月16日

《美军开放式任务系统（OMS）定义与文档（D&D）——Java关键抽象层（CAL）接口生成规范》47页标准

《美军开放式任务系统（OMS）定义与文档（D&D）——Java关键抽象层（CAL）接口生成规范》47页标准

专知会员服务

12+阅读 · 7月16日

美陆军任务式指挥人工智能解决方案

美陆军任务式指挥人工智能解决方案

专知会员服务

11+阅读 · 7月16日

ICML 2026 | 理论级自动形式化：从孤立命题到统一形式化知识库

ICML 2026 | 理论级自动形式化：从孤立命题到统一形式化知识库

专知会员服务

8+阅读 · 7月16日

综述 | 现代智能体自我改进，从模型更新到脚手架演化

综述 | 现代智能体自我改进，从模型更新到脚手架演化

专知会员服务

14+阅读 · 7月16日

美国陆军宣布“项目融合-顶点6”：现代化进程的关键里程碑

美国陆军宣布“项目融合-顶点6”：现代化进程的关键里程碑

专知会员服务

13+阅读 · 7月15日

相关VIP内容

战术先验知识启发的多智能体双层强化学习

战术先验知识启发的多智能体双层强化学习

专知会员服务

116+阅读 · 2023年5月9日

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

专知会员服务

59+阅读 · 2022年12月10日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

24+阅读 · 2022年3月19日

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

专知会员服务

19+阅读 · 2022年3月13日

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

专知会员服务

24+阅读 · 2022年1月10日

WWW21最新「比较学习」教程，135页PPT阐述从排名数据中学习

专知会员服务

37+阅读 · 2021年4月27日

【ICLR2021】基于返回的对比表示征学习在强化学习中的应用

专知会员服务

17+阅读 · 2021年2月24日

【ICML2020】强化学习中基于模型的方法，279页ppt

【ICML2020】强化学习中基于模型的方法，279页ppt

专知会员服务

48+阅读 · 2020年10月26日

【ICML2020-DeepMind】小数据，大决策:小数据模式下的模型选择

专知会员服务

37+阅读 · 2020年9月14日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《边缘端实时无线感知赋能现场多机器人部署》200页

人工智能赋能战场情报：提速决策进程

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

战力倍增器：自主武器系统与乌克兰及加沙冲突

相关资讯

WWW2022 | 基于因果的推荐算法教程

WWW2022 | 基于因果的推荐算法教程

机器学习与推荐算法

3+阅读 · 2022年5月26日

SIGIR2022 | 从Prompt的角度考量强化学习推荐系统

SIGIR2022 | 从Prompt的角度考量强化学习推荐系统

机器学习与推荐算法

1+阅读 · 2022年5月24日

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Survey on Causal Reinforcement Learning

Arxiv

0+阅读 · 2023年6月1日

Deep Meta-learning in Recommendation Systems: A Survey

Arxiv

13+阅读 · 2022年6月9日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

Cold-start Sequential Recommendation via Meta Learner

Cold-start Sequential Recommendation via Meta Learner

Arxiv

15+阅读 · 2020年12月10日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

A Survey on Knowledge Graph-Based Recommender Systems

Arxiv

92+阅读 · 2020年2月28日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Causal Embeddings for Recommendation

Arxiv

23+阅读 · 2018年8月3日

Explainable Recommendation: A Survey and New Perspectives

Arxiv

11+阅读 · 2018年5月13日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

相关基金

基于强流脉冲离子束辐照评估TiH2膜的损伤机制

国家自然科学基金

0+阅读 · 2015年12月31日

基于模型检测的非确定性概率模型学习

国家自然科学基金

2+阅读 · 2013年12月31日

基于网络聊天的服务中心的建模与运作管理

国家自然科学基金

1+阅读 · 2013年12月31日

基于混合逻辑动态模型的电力电子变换器显式预测控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

随机广义方程相对于概率分布的稳定性分析及应用

国家自然科学基金

1+阅读 · 2012年12月31日

基于在线时间序列搜索的算法交易策略研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于时空语义的微博突发事件检测与短期预测研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下基于pay-as-you-go模式的服务描述与发现

国家自然科学基金

0+阅读 · 2012年12月31日

基于多Agent的混杂交互传感器网络的群集扩散同步及优势聚集效应研究

国家自然科学基金

0+阅读 · 2011年12月31日

城市大规模群体疏散模拟仿真与管理策略研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员