Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time - 专知论文

会员服务 ·

0

优化器 · Learning · 情景 · 强化学习 · 方差减小 ·

2023 年 5 月 24 日

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

翻译：折扣MDP中具有短预热时间的遗憾最优无模型强化学习

Xiang Ji,Gen Li

A crucial problem in reinforcement learning is learning the optimal policy. We study this in tabular infinite-horizon discounted Markov decision processes under the online setting. The existing algorithms either fail to achieve regret optimality or have to incur a high memory and computational cost. In addition, existing optimal algorithms all require a long burn-in time in order to achieve optimal sample efficiency, i.e., their optimality is not guaranteed unless sample size surpasses a high threshold. We address both open problems by introducing a model-free algorithm that employs variance reduction and a novel technique that switches the execution policy in a slow-yet-adaptive manner. This is the first regret-optimal model-free algorithm in the discounted setting, with the additional benefit of a low burn-in time.

翻译：强化学习中的一个关键问题是学习最优策略。本文在在线设置下，针对表格型无限时域折扣马尔可夫决策过程研究该问题。现有算法要么无法实现遗憾最优性，要么需要承担高昂的内存和计算成本。此外，现有最优算法都需要较长的预热时间才能实现最优样本效率——即其最优性仅在样本量超过高阈值时才能得到保证。我们通过引入一种采用方差缩减技术以及一种缓慢但自适应切换执行策略的新颖技术的无模型算法，同时解决了这两个开放性问题。这是折扣设置下首个具有遗憾最优性的无模型算法，且额外具备低预热时间的优势。

0

相关内容

优化器

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

rs4969170GG基因型抑制SOCS3基因转录活性促进肝癌发生发展的功能机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

AFF1和AFF4形成分子开关调控细胞分化

国家自然科学基金

0+阅读 · 2015年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

Kupffer细胞上GITRL在大鼠肝移植免疫耐受重建中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Spire1蛋白在胰腺β细胞胰岛素分泌中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Doublecortin的动态表达在骨折愈合中的作用与调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

复形范畴中的Gorenstein同调维数

国家自然科学基金

0+阅读 · 2009年12月31日

On the Effective Horizon of Inverse Reinforcement Learning

Arxiv

0+阅读 · 2023年7月13日

On Bellman's principle of optimality and Reinforcement learning for safety-constrained Markov decision process

Arxiv

0+阅读 · 2023年7月12日

Fast Rates for the Regret of Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年7月12日

Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing

Arxiv

0+阅读 · 2023年7月11日

Reinforcement Learning with Non-Cumulative Objective

Arxiv

0+阅读 · 2023年7月11日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Arxiv

11+阅读 · 2022年12月1日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

VIP会员

文章信息

相关主题

最新内容

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

专知会员服务

2+阅读 · 今天14:36

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

专知会员服务

2+阅读 · 今天14:21

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

专知会员服务

1+阅读 · 今天14:27

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

专知会员服务

1+阅读 · 今天14:22

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

专知会员服务

4+阅读 · 7月17日

《边缘端实时无线感知赋能现场多机器人部署》200页

《边缘端实时无线感知赋能现场多机器人部署》200页

专知会员服务

6+阅读 · 7月17日

战力倍增器：自主武器系统与乌克兰及加沙冲突

战力倍增器：自主武器系统与乌克兰及加沙冲突

专知会员服务

4+阅读 · 7月17日

人工智能赋能战场情报：提速决策进程

人工智能赋能战场情报：提速决策进程

专知会员服务

2+阅读 · 7月17日

《拥抱新兴技术：面向未来军官的教育革新》

《拥抱新兴技术：面向未来军官的教育革新》

专知会员服务

5+阅读 · 7月17日

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

专知会员服务

2+阅读 · 7月17日

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

专知会员服务

3+阅读 · 7月17日

《火线上的后勤保障：对抗环境下的随机规划模型研究——俄乌场景案例分析》99页

《火线上的后勤保障：对抗环境下的随机规划模型研究——俄乌场景案例分析》99页

专知会员服务

11+阅读 · 7月16日

《无人地面战车（UGV）的崛起》报告

《无人地面战车（UGV）的崛起》报告

专知会员服务

7+阅读 · 7月16日

《无人机参数化与集群飞行创新项目的监控流程管理：模型、策略及自适应解决方案》

《无人机参数化与集群飞行创新项目的监控流程管理：模型、策略及自适应解决方案》

专知会员服务

6+阅读 · 7月16日

《美军开放式任务系统（OMS）定义与文档（D&D）——Java关键抽象层（CAL）接口生成规范》47页标准

《美军开放式任务系统（OMS）定义与文档（D&D）——Java关键抽象层（CAL）接口生成规范》47页标准

专知会员服务

14+阅读 · 7月16日

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

On the Effective Horizon of Inverse Reinforcement Learning

Arxiv

0+阅读 · 2023年7月13日

On Bellman's principle of optimality and Reinforcement learning for safety-constrained Markov decision process

Arxiv

0+阅读 · 2023年7月12日

Fast Rates for the Regret of Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年7月12日

Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing

Arxiv

0+阅读 · 2023年7月11日

Reinforcement Learning with Non-Cumulative Objective

Arxiv

0+阅读 · 2023年7月11日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Arxiv

11+阅读 · 2022年12月1日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

相关基金

rs4969170GG基因型抑制SOCS3基因转录活性促进肝癌发生发展的功能机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

AFF1和AFF4形成分子开关调控细胞分化

国家自然科学基金

0+阅读 · 2015年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

Kupffer细胞上GITRL在大鼠肝移植免疫耐受重建中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Spire1蛋白在胰腺β细胞胰岛素分泌中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Doublecortin的动态表达在骨折愈合中的作用与调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

复形范畴中的Gorenstein同调维数

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员