Choquet regularization for reinforcement learning - 专知论文

会员服务 ·

0

正则化项 · Learning · 微分熵 · 强化学习 · JMLR ·

2022 年 8 月 17 日

Choquet regularization for reinforcement learning

翻译：强化学习的Choquet正规化

Xia Han,Ruodu Wang,Xun Yu Zhou

We propose \emph{Choquet regularizers} to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton--Jacobi--Bellman equation of the problem, and solve it explicitly in the linear--quadratic (LQ) case via maximizing statically a mean--variance constrained Choquet regularizer. Under the LQ setting, we derive explicit optimal distributions for several specific Choquet regularizers, and conversely identify the Choquet regularizers that generate a number of broadly used exploratory samplers such as $\epsilon$-greedy, exponential, uniform and Gaussian.

翻译：我们建议 \ emph{ Choque 管理器} 测量和管理强化学习的勘探水平( RL ), 重新配置Wang et al. ( 2020, JMLR, 21 (198)) 的连续对流的RL 问题, 以此用Choquet 常规化器取代用于正规化的差异的 entropy 。我们从这个问题的汉密尔顿- Jacobi- Bellman 方程式中提取问题, 并通过静态地最大化平均逆差限制Choquet 常规化器( LQ) 来明确解决线性- 问题。在 LQ 设置下, 我们为几个特定的 Choquet 常规化器获得明确的最佳分配, 并反过来识别 Choquet 的常规化器, 产生大量广泛使用的探索性采样器, 如 $\ psilon- greedy, 指数、制服和高斯。

0

相关内容

正则化项

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

43+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【电子书推荐】机器学习导论Introduction to Machine Learning，斯坦福大学 | Nils J. Nilsson

【电子书推荐】机器学习导论Introduction to Machine Learning，斯坦福大学 | Nils J. Nilsson

专知会员服务

46+阅读 · 2019年11月19日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

含执行器死区/滞环非线性系统的模糊自适应容错控制

国家自然科学基金

0+阅读 · 2015年12月31日

非凸稀疏正则化模型与算法的研究

国家自然科学基金

3+阅读 · 2015年12月31日

混流式水轮机调节系统的非线性有限时间控制

国家自然科学基金

0+阅读 · 2015年12月31日

针刺调节HPA轴预防性治疗原发性痛经的脑机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

CITED2在心脏干细胞衰老中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于Archimedean三角模的区间犹豫模糊平均型集结算子及其在决策中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

再生核希尔伯特空间中自适应滤波新方法及应用

国家自然科学基金

0+阅读 · 2012年12月31日

DEC1、DEC2对人乳腺癌细胞衰老的调控作用及其作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于事件的强化学习及其在群机器人优化控制中的应用

国家自然科学基金

3+阅读 · 2012年12月31日

Lyapunov Function Consistent Adaptive Network Signal Control with Back Pressure and Reinforcement Learning

Arxiv

0+阅读 · 2022年10月6日

Learning convergence prediction of astrobots in multi-object spectrographs

Arxiv

0+阅读 · 2022年10月5日

Neural Distillation as a State Representation Bottleneck in Reinforcement Learning

Arxiv

0+阅读 · 2022年10月5日

Hierarchical Adversarial Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年10月5日

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Arxiv

0+阅读 · 2022年10月3日

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

Arxiv

0+阅读 · 2022年10月3日

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

Arxiv

1+阅读 · 2022年9月30日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2022年9月30日

Observational Robustness and Invariances in Reinforcement Learning via Lexicographic Objectives

Arxiv

0+阅读 · 2022年9月30日

Reinforcement Learning based Air Combat Maneuver Generation

Reinforcement Learning based Air Combat Maneuver Generation

Arxiv

92+阅读 · 2022年1月14日

VIP会员

文章信息

相关主题

最新内容

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

2+阅读 · 今天11:43

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

2+阅读 · 今天11:41

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

5+阅读 · 今天6:30

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

5+阅读 · 今天6:18

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

6+阅读 · 今天6:08

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

6+阅读 · 今天5:54

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

7+阅读 · 今天5:22

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

7+阅读 · 今天5:15

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

7+阅读 · 今天3:42

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

5+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

7+阅读 · 6月24日

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

10+阅读 · 6月24日

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

9+阅读 · 6月24日

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

7+阅读 · 6月24日

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

9+阅读 · 6月24日

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

43+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【电子书推荐】机器学习导论Introduction to Machine Learning，斯坦福大学 | Nils J. Nilsson

【电子书推荐】机器学习导论Introduction to Machine Learning，斯坦福大学 | Nils J. Nilsson

专知会员服务

46+阅读 · 2019年11月19日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

网状网络及其在军事领域的运用

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Lyapunov Function Consistent Adaptive Network Signal Control with Back Pressure and Reinforcement Learning

Arxiv

0+阅读 · 2022年10月6日

Learning convergence prediction of astrobots in multi-object spectrographs

Arxiv

0+阅读 · 2022年10月5日

Neural Distillation as a State Representation Bottleneck in Reinforcement Learning

Arxiv

0+阅读 · 2022年10月5日

Hierarchical Adversarial Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年10月5日

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Arxiv

0+阅读 · 2022年10月3日

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

Arxiv

0+阅读 · 2022年10月3日

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

Arxiv

1+阅读 · 2022年9月30日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2022年9月30日

Observational Robustness and Invariances in Reinforcement Learning via Lexicographic Objectives

Arxiv

0+阅读 · 2022年9月30日

Reinforcement Learning based Air Combat Maneuver Generation

Reinforcement Learning based Air Combat Maneuver Generation

Arxiv

92+阅读 · 2022年1月14日

相关基金

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

含执行器死区/滞环非线性系统的模糊自适应容错控制

国家自然科学基金

0+阅读 · 2015年12月31日

非凸稀疏正则化模型与算法的研究

国家自然科学基金

3+阅读 · 2015年12月31日

混流式水轮机调节系统的非线性有限时间控制

国家自然科学基金

0+阅读 · 2015年12月31日

针刺调节HPA轴预防性治疗原发性痛经的脑机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

CITED2在心脏干细胞衰老中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于Archimedean三角模的区间犹豫模糊平均型集结算子及其在决策中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

再生核希尔伯特空间中自适应滤波新方法及应用

国家自然科学基金

0+阅读 · 2012年12月31日

DEC1、DEC2对人乳腺癌细胞衰老的调控作用及其作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于事件的强化学习及其在群机器人优化控制中的应用

国家自然科学基金

3+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员