Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning - 专知论文

会员服务 ·

0

风险度量 · 度量 · 强化学习 · 深度神经网络 · 仿真方法 ·

2023 年 5 月 1 日

Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning

翻译：面向深度强化学习的条件可诱发动态风险测度

Anthony Coache,Sebastian Jaimungal,Álvaro Cartea

from arxiv, 41 pages, 7 figures

We propose a novel framework to solve risk-sensitive reinforcement learning (RL) problems where the agent optimises time-consistent dynamic spectral risk measures. Based on the notion of conditional elicitability, our methodology constructs (strictly consistent) scoring functions that are used as penalizers in the estimation procedure. Our contribution is threefold: we (i) devise an efficient approach to estimate a class of dynamic spectral risk measures with deep neural networks, (ii) prove that these dynamic spectral risk measures may be approximated to any arbitrary accuracy using deep neural networks, and (iii) develop a risk-sensitive actor-critic algorithm that uses full episodes and does not require any additional nested transitions. We compare our conceptually improved reinforcement learning algorithm with the nested simulation approach and illustrate its performance in two settings: statistical arbitrage and portfolio allocation on both simulated and real data.

翻译：我们提出了一种新颖的框架，用于解决智能体优化时间一致性动态谱风险测度的风险敏感强化学习问题。基于条件可诱发性的概念，我们的方法构建了（严格一致的）评分函数，并将其作为估计过程中的惩罚项。我们的贡献体现在三个方面：（i）设计了一种有效的方法，利用深度神经网络估计一类动态谱风险测度；（ii）证明了这些动态谱风险测度可以通过深度神经网络以任意精度近似；（iii）开发了一种风险敏感的演员-评论家算法，该算法使用完整轨迹且无需任何额外的嵌套转换。我们将这一概念改进的强化学习算法与嵌套模拟方法进行了比较，并在统计套利和投资组合配置两个场景中（基于模拟数据和真实数据）展示了其性能表现。

0

相关内容

风险度量

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

专知会员服务

119+阅读 · 2022年5月7日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【金融强化学习论文】金融资产组合管理问题的深度强化学习框架（A Deep Reinforcement Learning Framework for theFinancial Portfolio Management Problem）

【金融强化学习论文】金融资产组合管理问题的深度强化学习框架（A Deep Reinforcement Learning Framework for theFinancial Portfolio Management Problem）

专知会员服务

55+阅读 · 2019年12月16日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

针对大规模环境下复杂任务的策略搜索强化学习方法研究

国家自然科学基金

43+阅读 · 2015年12月31日

HIF-1调控Galectin-1与S1PR1-STAT3信号轴对话并诱导胃癌特异性肝转移的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

云计算环境下基于行为的动态信任模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

IFN-γ通过EZH2介导lncRNA调控肝癌中枯否细胞表达Galectin-9的机制

国家自然科学基金

0+阅读 · 2013年12月31日

最小最大后悔准则下的应急设施选址策略研究

国家自然科学基金

1+阅读 · 2012年12月31日

最优和自校正广义系统信息融合状态估计算法

国家自然科学基金

0+阅读 · 2012年12月31日

动态云环境中基于SLA的工作流调度

国家自然科学基金

0+阅读 · 2012年12月31日

基于信息表示与传导机制的异质agent计算金融模型

国家自然科学基金

0+阅读 · 2011年12月31日

Netrin-1对肝癌细胞EMT的调控及其侵袭表型逆转的实验研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于支持向量机的复杂连续系统强化学习控制研究

国家自然科学基金

12+阅读 · 2008年12月31日

White-Box Adversarial Policies in Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年6月13日

Can ChatGPT Enable ITS? The Case of Mixed Traffic Control via Reinforcement Learning

Arxiv

0+阅读 · 2023年6月13日

Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments

Arxiv

0+阅读 · 2023年6月13日

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning

Arxiv

0+阅读 · 2023年6月13日

Deep Offline Reinforcement Learning for Real-world Treatment Optimization Applications

Arxiv

0+阅读 · 2023年6月13日

Dynamic Interval Restrictions on Action Spaces in Deep Reinforcement Learning for Obstacle Avoidance

Arxiv

0+阅读 · 2023年6月13日

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

Arxiv

0+阅读 · 2023年6月13日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

VIP会员

文章信息

相关主题

深度神经网络

最新内容

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

7+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

4+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

6+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

6+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

7+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

11+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

11+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

7+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

11+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

8+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

19+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

9+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

6+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

8+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

8+阅读 · 6月17日

相关VIP内容

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

专知会员服务

119+阅读 · 2022年5月7日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【金融强化学习论文】金融资产组合管理问题的深度强化学习框架（A Deep Reinforcement Learning Framework for theFinancial Portfolio Management Problem）

【金融强化学习论文】金融资产组合管理问题的深度强化学习框架（A Deep Reinforcement Learning Framework for theFinancial Portfolio Management Problem）

专知会员服务

55+阅读 · 2019年12月16日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

White-Box Adversarial Policies in Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年6月13日

Can ChatGPT Enable ITS? The Case of Mixed Traffic Control via Reinforcement Learning

Arxiv

0+阅读 · 2023年6月13日

Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments

Arxiv

0+阅读 · 2023年6月13日

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning

Arxiv

0+阅读 · 2023年6月13日

Deep Offline Reinforcement Learning for Real-world Treatment Optimization Applications

Arxiv

0+阅读 · 2023年6月13日

Dynamic Interval Restrictions on Action Spaces in Deep Reinforcement Learning for Obstacle Avoidance

Arxiv

0+阅读 · 2023年6月13日

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

Arxiv

0+阅读 · 2023年6月13日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

相关基金

针对大规模环境下复杂任务的策略搜索强化学习方法研究

国家自然科学基金

43+阅读 · 2015年12月31日

HIF-1调控Galectin-1与S1PR1-STAT3信号轴对话并诱导胃癌特异性肝转移的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

云计算环境下基于行为的动态信任模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

IFN-γ通过EZH2介导lncRNA调控肝癌中枯否细胞表达Galectin-9的机制

国家自然科学基金

0+阅读 · 2013年12月31日

最小最大后悔准则下的应急设施选址策略研究

国家自然科学基金

1+阅读 · 2012年12月31日

最优和自校正广义系统信息融合状态估计算法

国家自然科学基金

0+阅读 · 2012年12月31日

动态云环境中基于SLA的工作流调度

国家自然科学基金

0+阅读 · 2012年12月31日

基于信息表示与传导机制的异质agent计算金融模型

国家自然科学基金

0+阅读 · 2011年12月31日

Netrin-1对肝癌细胞EMT的调控及其侵袭表型逆转的实验研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于支持向量机的复杂连续系统强化学习控制研究

国家自然科学基金

12+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员