求解参数鲁棒性规避问题：基于未知可行性的强化学习方法 (Solving Parameter-Robust Avoid Problems with Unknown Feasibility using Reinforcement Learning) - 专知论文

会员服务 ·

0

可行 · 鲁棒 · 可达性问题 · 强化学习 · 高维 ·

Solving Parameter-Robust Avoid Problems with Unknown Feasibility using Reinforcement Learning

翻译：求解参数鲁棒性规避问题：基于未知可行性的强化学习方法

Oswin So,Eric Yang Yu,Songyuan Zhang,Matthew Cleaveland,Mitchell Black,Chuchu Fan

from arxiv, ICLR 2026. The project page can be found at https://oswinso.xyz/fge

Recent advances in deep reinforcement learning (RL) have achieved strong results on high-dimensional control tasks, but applying RL to reachability problems raises a fundamental mismatch: reachability seeks to maximize the set of states from which a system remains safe indefinitely, while RL optimizes expected returns over a user-specified distribution. This mismatch can result in policies that perform poorly on low-probability states that are still within the safe set. A natural alternative is to frame the problem as a robust optimization over a set of initial conditions that specify the initial state, dynamics and safe set, but whether this problem has a solution depends on the feasibility of the specified set, which is unknown a priori. We propose Feasibility-Guided Exploration (FGE), a method that simultaneously identifies a subset of feasible initial conditions under which a safe policy exists, and learns a policy to solve the reachability problem over this set of initial conditions. Empirical results demonstrate that FGE learns policies with over 50% more coverage than the best existing method for challenging initial conditions across tasks in the MuJoCo simulator and the Kinetix simulator with pixel observations.

翻译：深度强化学习（RL）的最新进展在高维控制任务中取得了显著成果，但将RL应用于可达性问题时存在一个根本性不匹配：可达性旨在最大化系统能够无限期保持安全的初始状态集合，而RL则是在用户指定的分布上优化期望回报。这种不匹配可能导致策略在安全集内但概率较低的状态上表现不佳。一种自然的替代方案是将问题构建为对一组初始条件（包括初始状态、动力学模型和安全集）的鲁棒优化，但该问题是否存在解取决于所指定集合的可行性，而这一可行性在事先是未知的。我们提出可行性引导探索（FGE）方法，该方法能够同时识别存在安全策略的可行初始条件子集，并学习在该初始条件集合上解决可达性问题的策略。实验结果表明，在MuJoCo仿真器和基于像素观测的Kinetix仿真器的多项任务中，针对具有挑战性的初始条件，FGE所学习的策略比现有最佳方法覆盖范围提高了50%以上。

0

相关内容

【CMU博士论文】基于课程学习的鲁棒强化学习

【CMU博士论文】基于课程学习的鲁棒强化学习

专知会员服务

20+阅读 · 2025年3月27日

【CMU博士论文】通过课程学习实现鲁棒的强化学习

【CMU博士论文】通过课程学习实现鲁棒的强化学习

专知会员服务

25+阅读 · 2024年12月15日

【CMU博士论文】面向可部署的强化学习：安全性、鲁棒性、适应性和可扩展性

【CMU博士论文】面向可部署的强化学习：安全性、鲁棒性、适应性和可扩展性

专知会员服务

40+阅读 · 2024年4月23日

【MIT博士论文】在真实世界环境中的强化学习系统的鲁棒性，292页pdf

【MIT博士论文】在真实世界环境中的强化学习系统的鲁棒性，292页pdf

专知会员服务

41+阅读 · 2024年3月3日

【牛津大学博士论文】通过合成环境和离线数据实现高效且鲁棒的强化学习，229页pdf

【牛津大学博士论文】通过合成环境和离线数据实现高效且鲁棒的强化学习，229页pdf

专知会员服务

35+阅读 · 2024年1月21日

【CMU博士论文】强化学习可解释：统一状态和策略级解释，132页pdf

【CMU博士论文】强化学习可解释：统一状态和策略级解释，132页pdf

专知会员服务

40+阅读 · 2022年11月22日

强化学习如何可解释？浙大最新《可解释强化学习》综述，37页pdf1阐述XRL概念、算法、挑战

强化学习如何可解释？浙大最新《可解释强化学习》综述，37页pdf1阐述XRL概念、算法、挑战

专知会员服务

89+阅读 · 2022年11月17日

【NeurIPS2022】通过模型转换的可解释强化学习

【NeurIPS2022】通过模型转换的可解释强化学习

专知会员服务

38+阅读 · 2022年10月4日

基于模型的强化学习综述

基于模型的强化学习综述

专知会员服务

149+阅读 · 2022年7月13日

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

专知会员服务

22+阅读 · 2020年4月8日

强化学习如何可解释？浙大最新《可解释强化学习》综述，37页pdf1阐述XRL概念、算法、挑战

强化学习如何可解释？浙大最新《可解释强化学习》综述，37页pdf1阐述XRL概念、算法、挑战

专知

10+阅读 · 2022年11月17日

【牛津大学博士论文】强化学习系统的数据高效部署，165页pdf

【牛津大学博士论文】强化学习系统的数据高效部署，165页pdf

专知

14+阅读 · 2022年10月15日

【牛津大学博士论文】元强化学习的快速自适应，217页pdf

【牛津大学博士论文】元强化学习的快速自适应，217页pdf

专知

29+阅读 · 2022年9月19日

基于模型的强化学习综述

基于模型的强化学习综述

专知

42+阅读 · 2022年7月13日

【MIT博士论文】数据高效强化学习，176页pdf

【MIT博士论文】数据高效强化学习，176页pdf

专知

19+阅读 · 2022年7月11日

「强化学习可解释性」最新2022综述

「强化学习可解释性」最新2022综述

专知

12+阅读 · 2022年1月16日

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

【NeurlPS2019教程】微软首席研究员Katja Hofmann - 强化学习：过去、现在和未来展望，附97页ppt

【NeurlPS2019教程】微软首席研究员Katja Hofmann - 强化学习：过去、现在和未来展望，附97页ppt

专知

12+阅读 · 2019年12月16日

关于强化学习（附代码，练习和解答）

关于强化学习（附代码，练习和解答）

深度学习

37+阅读 · 2018年1月30日

【强化学习】强化学习+深度学习=人工智能

【强化学习】强化学习+深度学习=人工智能

产业智能官

55+阅读 · 2017年8月11日

超大规模约束优化问题算法及其应用天元数学交流项目

国家自然科学基金

2+阅读 · 2017年12月31日

针对大规模环境下复杂任务的策略搜索强化学习方法研究

国家自然科学基金

42+阅读 · 2015年12月31日

基于重要性采样的并行离策略强化学习方法研究

国家自然科学基金

23+阅读 · 2015年12月31日

考虑材料分布不确定性的结构拓扑优化问题数学建模与求解方法

国家自然科学基金

0+阅读 · 2015年12月31日

混合交通环境中自动驾驶汽车安全可达性分析与优化控制研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于强化学习的分布参数系统数据驱动控制

国家自然科学基金

7+阅读 · 2015年12月31日

应力约束下多相材料结构非概率可靠性拓扑优化方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

强化学习关键技术及其在机器人行为学习中的应用

国家自然科学基金

22+阅读 · 2009年12月31日

基于支持向量机的复杂连续系统强化学习控制研究

国家自然科学基金

11+阅读 · 2008年12月31日

Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models

Arxiv

0+阅读 · 2月17日

Provably Optimal Reinforcement Learning under Safety Filtering

Arxiv

0+阅读 · 2月11日

Constrained Sampling to Guide Universal Manipulation RL

Arxiv

0+阅读 · 2月9日

Beyond Correctness: Learning Robust Reasoning via Transfer

Arxiv

0+阅读 · 2月9日

Provable Domain Adaptation for Offline Reinforcement Learning with Limited Samples

Arxiv

0+阅读 · 2月7日

Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL

Arxiv

0+阅读 · 2月3日

Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Arxiv

0+阅读 · 1月30日

Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers

Arxiv

0+阅读 · 1月30日

Statistical Reinforcement Learning in the Real World: A Survey of Challenges and Future Directions

Arxiv

0+阅读 · 1月21日

Provably Safe Reinforcement Learning for Stochastic Reach-Avoid Problems with Entropy Regularization

Arxiv

0+阅读 · 1月15日

VIP会员

文章信息

相关主题

可达性问题

相关VIP内容

【CMU博士论文】基于课程学习的鲁棒强化学习

【CMU博士论文】基于课程学习的鲁棒强化学习

专知会员服务

20+阅读 · 2025年3月27日

【CMU博士论文】通过课程学习实现鲁棒的强化学习

【CMU博士论文】通过课程学习实现鲁棒的强化学习

专知会员服务

25+阅读 · 2024年12月15日

【CMU博士论文】面向可部署的强化学习：安全性、鲁棒性、适应性和可扩展性

【CMU博士论文】面向可部署的强化学习：安全性、鲁棒性、适应性和可扩展性

专知会员服务

40+阅读 · 2024年4月23日

【MIT博士论文】在真实世界环境中的强化学习系统的鲁棒性，292页pdf

【MIT博士论文】在真实世界环境中的强化学习系统的鲁棒性，292页pdf

专知会员服务

41+阅读 · 2024年3月3日

【牛津大学博士论文】通过合成环境和离线数据实现高效且鲁棒的强化学习，229页pdf

【牛津大学博士论文】通过合成环境和离线数据实现高效且鲁棒的强化学习，229页pdf

专知会员服务

35+阅读 · 2024年1月21日

【CMU博士论文】强化学习可解释：统一状态和策略级解释，132页pdf

【CMU博士论文】强化学习可解释：统一状态和策略级解释，132页pdf

专知会员服务

40+阅读 · 2022年11月22日

强化学习如何可解释？浙大最新《可解释强化学习》综述，37页pdf1阐述XRL概念、算法、挑战

强化学习如何可解释？浙大最新《可解释强化学习》综述，37页pdf1阐述XRL概念、算法、挑战

专知会员服务

89+阅读 · 2022年11月17日

【NeurIPS2022】通过模型转换的可解释强化学习

【NeurIPS2022】通过模型转换的可解释强化学习

专知会员服务

38+阅读 · 2022年10月4日

基于模型的强化学习综述

基于模型的强化学习综述

专知会员服务

149+阅读 · 2022年7月13日

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

专知会员服务

22+阅读 · 2020年4月8日

热门VIP内容

开通专知VIP会员享更多权益服务

《可信人工智能赋能系统的支柱》

《从经典神经网络到不确定性下的拓扑神经网络：军事应用》2026最新40页报告

人工智能赋能边缘与自主系统：美陆军现代化进程聚焦威胁探测与战术边缘情报

《人工智能：对战略与力量的影响》slides

相关资讯

强化学习如何可解释？浙大最新《可解释强化学习》综述，37页pdf1阐述XRL概念、算法、挑战

强化学习如何可解释？浙大最新《可解释强化学习》综述，37页pdf1阐述XRL概念、算法、挑战

专知

10+阅读 · 2022年11月17日

【牛津大学博士论文】强化学习系统的数据高效部署，165页pdf

【牛津大学博士论文】强化学习系统的数据高效部署，165页pdf

专知

14+阅读 · 2022年10月15日

【牛津大学博士论文】元强化学习的快速自适应，217页pdf

【牛津大学博士论文】元强化学习的快速自适应，217页pdf

专知

29+阅读 · 2022年9月19日

基于模型的强化学习综述

基于模型的强化学习综述

专知

42+阅读 · 2022年7月13日

【MIT博士论文】数据高效强化学习，176页pdf

【MIT博士论文】数据高效强化学习，176页pdf

专知

19+阅读 · 2022年7月11日

「强化学习可解释性」最新2022综述

「强化学习可解释性」最新2022综述

专知

12+阅读 · 2022年1月16日

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

【NeurlPS2019教程】微软首席研究员Katja Hofmann - 强化学习：过去、现在和未来展望，附97页ppt

【NeurlPS2019教程】微软首席研究员Katja Hofmann - 强化学习：过去、现在和未来展望，附97页ppt

专知

12+阅读 · 2019年12月16日

关于强化学习（附代码，练习和解答）

关于强化学习（附代码，练习和解答）

深度学习

37+阅读 · 2018年1月30日

【强化学习】强化学习+深度学习=人工智能

【强化学习】强化学习+深度学习=人工智能

产业智能官

55+阅读 · 2017年8月11日

相关论文

Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models

Arxiv

0+阅读 · 2月17日

Provably Optimal Reinforcement Learning under Safety Filtering

Arxiv

0+阅读 · 2月11日

Constrained Sampling to Guide Universal Manipulation RL

Arxiv

0+阅读 · 2月9日

Beyond Correctness: Learning Robust Reasoning via Transfer

Arxiv

0+阅读 · 2月9日

Provable Domain Adaptation for Offline Reinforcement Learning with Limited Samples

Arxiv

0+阅读 · 2月7日

Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL

Arxiv

0+阅读 · 2月3日

Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Arxiv

0+阅读 · 1月30日

Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers

Arxiv

0+阅读 · 1月30日

Statistical Reinforcement Learning in the Real World: A Survey of Challenges and Future Directions

Arxiv

0+阅读 · 1月21日

Provably Safe Reinforcement Learning for Stochastic Reach-Avoid Problems with Entropy Regularization

Arxiv

0+阅读 · 1月15日

相关基金

超大规模约束优化问题算法及其应用天元数学交流项目

国家自然科学基金

2+阅读 · 2017年12月31日

针对大规模环境下复杂任务的策略搜索强化学习方法研究

国家自然科学基金

42+阅读 · 2015年12月31日

基于重要性采样的并行离策略强化学习方法研究

国家自然科学基金

23+阅读 · 2015年12月31日

考虑材料分布不确定性的结构拓扑优化问题数学建模与求解方法

国家自然科学基金

0+阅读 · 2015年12月31日

混合交通环境中自动驾驶汽车安全可达性分析与优化控制研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于强化学习的分布参数系统数据驱动控制

国家自然科学基金

7+阅读 · 2015年12月31日

应力约束下多相材料结构非概率可靠性拓扑优化方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

强化学习关键技术及其在机器人行为学习中的应用

国家自然科学基金

22+阅读 · 2009年12月31日

基于支持向量机的复杂连续系统强化学习控制研究

国家自然科学基金

11+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员