Square-root regret bounds for continuous-time episodic Markov decision processes - 专知论文

会员服务 ·

0

Markov · Processing（编程语言） · Learning · 值迭代 · 上置信界限 ·

2022 年 10 月 3 日

Square-root regret bounds for continuous-time episodic Markov decision processes

翻译：Squarorot 致遗憾的连连时间的附带事件马可夫决定程序

Xuefeng Gao,Xun Yu Zhou

We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-horizon episodic setting. We present a learning algorithm based on the methods of value iteration and upper confidence bound. We derive an upper bound on the worst-case expected regret for the proposed algorithm, and establish a worst-case lower bound, both bounds are of the order of square-root on the number of episodes. Finally, we conduct simulation experiments to illustrate the performance of our algorithm.

翻译：我们研究在有限偏顺偶发环境中持续时间的Markov决策程序(MDPs)的强化学习。我们根据价值迭代和上层信心约束的方法提出一种学习算法。我们从最坏情况下获得对拟议算法的预期遗憾,并建立了最坏情况下较低的界限,两者的界限都是关于事件数量的平方根顺序。最后,我们进行模拟实验,以说明我们的算法的性能。

0

相关内容

Markov

干货书！基于单调算子的大规模凸优化，348页pdf

干货书！基于单调算子的大规模凸优化，348页pdf

专知会员服务

50+阅读 · 2022年7月24日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

55+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

受体MDSCs通过CEACAM1-TIM3调控NK细胞功能介导肝移植免疫耐受的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

肺内皮细胞S1PR1受体在流感病毒所致ARDS中的作用

国家自然科学基金

1+阅读 · 2014年12月31日

INF-γ通过CIITA调控PPARγ转录机制及其在2型糖尿病中意义的探讨

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

miR-185抑制前列腺癌细胞中雄激素受体的表达及其介导的信号通路的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

胶质细胞CBR2激活在电针预处理诱导延迟相脑缺血耐受中作用

国家自然科学基金

0+阅读 · 2011年12月31日

多巴胺受体对α/β1-、AT1受体抑制作用在高血压病发生中的作用和机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

慢性间断低氧对家兔颏舌肌运动皮质区调控上气道扩张肌的影响及作用机制

国家自然科学基金

0+阅读 · 2008年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

On the Hill relation and the mean reaction time for metastable processes

Arxiv

0+阅读 · 2022年11月8日

Structured Mixture of Continuation-ratio Logits Models for Ordinal Regression

Arxiv

0+阅读 · 2022年11月8日

Decentralized Complete Dictionary Learning via $\ell^{4}$-Norm Maximization

Arxiv

0+阅读 · 2022年11月7日

Sparse Horseshoe Estimation via Expectation-Maximisation

Arxiv

0+阅读 · 2022年11月7日

On the connection between Bregman divergence and value in regularized Markov decision processes

Arxiv

0+阅读 · 2022年11月6日

On learning history based policies for controlling Markov decision processes

Arxiv

0+阅读 · 2022年11月6日

Multiscale mortar mixed finite element methods for the Biot system of poroelasticity

Arxiv

0+阅读 · 2022年11月5日

Space-time finite element methods for distributed optimal control of the wave equation

Arxiv

0+阅读 · 2022年11月4日

Bayesian methods of vector autoregressions with tensor decompositions

Arxiv

0+阅读 · 2022年11月4日

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

Arxiv

0+阅读 · 2022年11月4日

VIP会员

文章信息

相关主题

Processing（编程语言）

上置信界限

最新内容

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

1+阅读 · 今天14:45

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

1+阅读 · 今天14:43

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

4+阅读 · 今天14:31

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

3+阅读 · 今天14:20

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

2+阅读 · 今天14:11

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

3+阅读 · 今天14:07

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

3+阅读 · 今天14:03

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

2+阅读 · 今天13:59

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

5+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

8+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

7+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

4+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

5+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

5+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

8+阅读 · 6月22日

相关VIP内容

干货书！基于单调算子的大规模凸优化，348页pdf

干货书！基于单调算子的大规模凸优化，348页pdf

专知会员服务

50+阅读 · 2022年7月24日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

55+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 世界动作模型：少做梦，多行动

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

美以伊冲突：无人机与人工智能的运用

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

On the Hill relation and the mean reaction time for metastable processes

Arxiv

0+阅读 · 2022年11月8日

Structured Mixture of Continuation-ratio Logits Models for Ordinal Regression

Arxiv

0+阅读 · 2022年11月8日

Decentralized Complete Dictionary Learning via $\ell^{4}$-Norm Maximization

Arxiv

0+阅读 · 2022年11月7日

Sparse Horseshoe Estimation via Expectation-Maximisation

Arxiv

0+阅读 · 2022年11月7日

On the connection between Bregman divergence and value in regularized Markov decision processes

Arxiv

0+阅读 · 2022年11月6日

On learning history based policies for controlling Markov decision processes

Arxiv

0+阅读 · 2022年11月6日

Multiscale mortar mixed finite element methods for the Biot system of poroelasticity

Arxiv

0+阅读 · 2022年11月5日

Space-time finite element methods for distributed optimal control of the wave equation

Arxiv

0+阅读 · 2022年11月4日

Bayesian methods of vector autoregressions with tensor decompositions

Arxiv

0+阅读 · 2022年11月4日

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

Arxiv

0+阅读 · 2022年11月4日

相关基金

受体MDSCs通过CEACAM1-TIM3调控NK细胞功能介导肝移植免疫耐受的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

肺内皮细胞S1PR1受体在流感病毒所致ARDS中的作用

国家自然科学基金

1+阅读 · 2014年12月31日

INF-γ通过CIITA调控PPARγ转录机制及其在2型糖尿病中意义的探讨

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

miR-185抑制前列腺癌细胞中雄激素受体的表达及其介导的信号通路的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

胶质细胞CBR2激活在电针预处理诱导延迟相脑缺血耐受中作用

国家自然科学基金

0+阅读 · 2011年12月31日

多巴胺受体对α/β1-、AT1受体抑制作用在高血压病发生中的作用和机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

慢性间断低氧对家兔颏舌肌运动皮质区调控上气道扩张肌的影响及作用机制

国家自然科学基金

0+阅读 · 2008年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员