Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk - 专知论文

会员服务 ·

0

稳健性 · Markov · Processing（编程语言） · 优化器 · Learning ·

2024 年 5 月 2 日

Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk

翻译：鲁棒条件风险值敏感强化学习

Xinyi Ni,Lifeng Lai

Robust Markov Decision Processes (RMDPs) have received significant research interest, offering an alternative to standard Markov Decision Processes (MDPs) that often assume fixed transition probabilities. RMDPs address this by optimizing for the worst-case scenarios within ambiguity sets. While earlier studies on RMDPs have largely centered on risk-neutral reinforcement learning (RL), with the goal of minimizing expected total discounted costs, in this paper, we analyze the robustness of CVaR-based risk-sensitive RL under RMDP. Firstly, we consider predetermined ambiguity sets. Based on the coherency of CVaR, we establish a connection between robustness and risk sensitivity, thus, techniques in risk-sensitive RL can be adopted to solve the proposed problem. Furthermore, motivated by the existence of decision-dependent uncertainty in real-world problems, we study problems with state-action-dependent ambiguity sets. To solve this, we define a new risk measure named NCVaR and build the equivalence of NCVaR optimization and robust CVaR optimization. We further propose value iteration algorithms and validate our approach in simulation experiments.

翻译：鲁棒马尔可夫决策过程（RMDP）作为标准马尔可夫决策过程（MDP）的替代方案，因解决其固定转移概率假设的局限性而备受关注，通过优化模糊集内的最坏场景来应对这一问题。早期RMDP研究主要聚焦于风险中性强化学习（RL），以最小化期望总折扣成本为目标，而本文系统分析了基于条件风险值（CVaR）的风险敏感强化学习在RMDP框架下的鲁棒性。首先，我们考虑预定义模糊集，基于CVaR的一致性特性建立鲁棒性与风险敏感性之间的关联，从而可借鉴风险敏感强化学习技术求解该问题。进而，针对实际应用中存在的决策依赖不确定性，我们研究了状态-动作依赖的模糊集问题。为解决该问题，我们定义了一种名为NCVaR的新型风险度量，并证明NCVaR优化与鲁棒CVaR优化的等价性。最后，我们提出值迭代算法，并通过仿真实验验证了方法的有效性。

0

相关内容

稳健性

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

“杰文斯”悖论、能效政策改进与“双控目标”分解

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

Harnessing GPU Power for Enhanced OLTP: A Study in Concurrency Control Schemes

Arxiv

0+阅读 · 2024年6月14日

Learning Solution-Aware Transformers for Efficiently Solving Quadratic Assignment Problem

Arxiv

0+阅读 · 2024年6月14日

Decompose and Aggregate: A Step-by-Step Interpretable Evaluation Framework

Arxiv

0+阅读 · 2024年6月14日

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Arxiv

0+阅读 · 2024年6月13日

Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis

Arxiv

0+阅读 · 2024年6月13日

Towards Expert-Level Medical Question Answering with Large Language Models

Arxiv

26+阅读 · 2023年5月16日

Graph Structure Learning with Variational Information Bottleneck

Arxiv

11+阅读 · 2021年12月16日

Generalized Multi-Relational Graph Convolution Network

Arxiv

10+阅读 · 2020年6月12日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks

Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks

Arxiv

10+阅读 · 2019年9月5日

VIP会员

文章信息

相关主题

Processing（编程语言）

最新内容

综述 | Weights or Skills?：机器人学习从动作预测权重到自编写技能

综述 | Weights or Skills?：机器人学习从动作预测权重到自编写技能

专知会员服务

0+阅读 · 7分钟前

论文 | Causal Inference with Unstructured Outcomes：面向文本与图像结果的因果推断

论文 | Causal Inference with Unstructured Outcomes：面向文本与图像结果的因果推断

专知会员服务

0+阅读 · 15分钟前

面向2027年及未来的海军情报改革

面向2027年及未来的海军情报改革

专知会员服务

3+阅读 · 8月5日

透视一体化防空：人工智能如何重构从探测到杀伤的靶向全流程

透视一体化防空：人工智能如何重构从探测到杀伤的靶向全流程

专知会员服务

6+阅读 · 8月5日

《多武器毁伤效能评估：解析解与优化瞄准点研究》

《多武器毁伤效能评估：解析解与优化瞄准点研究》

专知会员服务

6+阅读 · 8月5日

《一种面向不确定作战环境的异构无人机协同任务与航路规划随机多目标优化方法》

《一种面向不确定作战环境的异构无人机协同任务与航路规划随机多目标优化方法》

专知会员服务

7+阅读 · 8月5日

《一种基于博弈论的海军平台动态武器分配问题求解方法》

《一种基于博弈论的海军平台动态武器分配问题求解方法》

专知会员服务

5+阅读 · 8月5日

《一种面向武器目标分配的快速可扩展Transformer-指针强化学习框架》

《一种面向武器目标分配的快速可扩展Transformer-指针强化学习框架》

专知会员服务

7+阅读 · 8月5日

ACM MM 2026 | DualG-MRAG：解耦宏观推理与微观匹配的多模态检索增强生成

ACM MM 2026 | DualG-MRAG：解耦宏观推理与微观匹配的多模态检索增强生成

专知会员服务

5+阅读 · 8月5日

综述 | Self-Evolving Coding Agents：自进化编程智能体

综述 | Self-Evolving Coding Agents：自进化编程智能体

专知会员服务

6+阅读 · 8月5日

战火淬炼创新：美军联合战备训练中心探讨现代战场挑战

战火淬炼创新：美军联合战备训练中心探讨现代战场挑战

专知会员服务

5+阅读 · 8月5日

美海军陆战队将三型无人机整合入统一战场网络

美海军陆战队将三型无人机整合入统一战场网络

专知会员服务

3+阅读 · 8月5日

《战术指挥控制要务：构建韧性机动指挥控制网格》美智库报告

《战术指挥控制要务：构建韧性机动指挥控制网格》美智库报告

专知会员服务

5+阅读 · 8月5日

《无人机蜂群：释放人类-蜂群编队的潜能》

《无人机蜂群：释放人类-蜂群编队的潜能》

专知会员服务

6+阅读 · 8月5日

《战略战术化：一项综合性述评》

《战略战术化：一项综合性述评》

专知会员服务

4+阅读 · 8月5日

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

论文 | Causal Inference with Unstructured Outcomes：面向文本与图像结果的因果推断

透视一体化防空：人工智能如何重构从探测到杀伤的靶向全流程

综述 | Weights or Skills?：机器人学习从动作预测权重到自编写技能

面向2027年及未来的海军情报改革

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

Harnessing GPU Power for Enhanced OLTP: A Study in Concurrency Control Schemes

Arxiv

0+阅读 · 2024年6月14日

Learning Solution-Aware Transformers for Efficiently Solving Quadratic Assignment Problem

Arxiv

0+阅读 · 2024年6月14日

Decompose and Aggregate: A Step-by-Step Interpretable Evaluation Framework

Arxiv

0+阅读 · 2024年6月14日

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Arxiv

0+阅读 · 2024年6月13日

Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis

Arxiv

0+阅读 · 2024年6月13日

Towards Expert-Level Medical Question Answering with Large Language Models

Arxiv

26+阅读 · 2023年5月16日

Graph Structure Learning with Variational Information Bottleneck

Arxiv

11+阅读 · 2021年12月16日

Generalized Multi-Relational Graph Convolution Network

Arxiv

10+阅读 · 2020年6月12日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks

Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks

Arxiv

10+阅读 · 2019年9月5日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

“杰文斯”悖论、能效政策改进与“双控目标”分解

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员