Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees - 专知论文

会员服务 ·

0

蒸馏 · 潜在 · Learning · MoDELS · SimPLe ·

2023 年 4 月 21 日

Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

翻译：Wasserstein自编码马尔可夫决策过程：面向多侧保障的高效精简强化学习策略的形式化验证

Florent Delgrange,Ann Nowé,Guillermo A. Pérez

from arxiv, ICLR 2023, 10 pages main text, 14 pages appendix (excluding references)

Although deep reinforcement learning (DRL) has many success stories, the large-scale deployment of policies learned through these advanced techniques in safety-critical scenarios is hindered by their lack of formal guarantees. Variational Markov Decision Processes (VAE-MDPs) are discrete latent space models that provide a reliable framework for distilling formally verifiable controllers from any RL policy. While the related guarantees address relevant practical aspects such as the satisfaction of performance and safety properties, the VAE approach suffers from several learning flaws (posterior collapse, slow learning speed, poor dynamics estimates), primarily due to the absence of abstraction and representation guarantees to support latent optimization. We introduce the Wasserstein auto-encoded MDP (WAE-MDP), a latent space model that fixes those issues by minimizing a penalized form of the optimal transport between the behaviors of the agent executing the original policy and the distilled policy, for which the formal guarantees apply. Our approach yields bisimulation guarantees while learning the distilled policy, allowing concrete optimization of the abstraction and representation model quality. Our experiments show that, besides distilling policies up to 10 times faster, the latent model quality is indeed better in general. Moreover, we present experiments from a simple time-to-failure verification algorithm on the latent space. The fact that our approach enables such simple verification techniques highlights its applicability.

翻译：尽管深度强化学习（DRL）取得了诸多成功，但在安全关键场景中大规模部署通过这类先进技术习得的策略，仍因其缺乏形式化保障而受阻。变分马尔可夫决策过程（VAE-MDP）作为一种离散潜空间模型，为从任意强化学习策略中提取可形式化验证的控制器提供了可靠框架。虽然其相关保障涵盖了性能与安全性满足等实际应用要素，但VAE方法因缺乏支持潜优化的抽象与表征保障，导致若干学习缺陷（后验坍塌、学习速度缓慢、动态模型估计较差）。本文提出Wasserstein自编码马尔可夫决策过程（WAE-MDP），通过最小化原始策略执行者与经形式化保障的精简策略之间行为的最优运输的惩罚形式，有效解决了上述问题。我们的方法在学习精简策略的同时提供了双模拟保障，能够具体优化抽象与表征模型的质量。实验表明，该方法不仅将策略精简速度提升高达10倍，且潜模型质量在总体上更为优异。此外，本文展示了基于潜空间的简单失效时间验证算法的实验结果。该方法对这类简单验证技术的兼容性，进一步凸显了其实际应用价值。

0

相关内容

【PAISS 2021 教程】概率散度与生成式模型，92页ppt

【PAISS 2021 教程】概率散度与生成式模型，92页ppt

专知会员服务

34+阅读 · 2021年11月30日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Numb在肾脏细胞自噬中的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

多功能有机多孔骨架纳米材料的制备及其作为药物载体的研究

国家自然科学基金

0+阅读 · 2013年12月31日

BDNF/TrkB途径介导调控骨髓瘤MDSCs破骨分化的作用和机制

国家自然科学基金

0+阅读 · 2013年12月31日

放射状胶质细胞在海马神经再生中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于鞅理论与统计信息的仿真优化

国家自然科学基金

1+阅读 · 2012年12月31日

不同途径移植HUCB-MSCs治疗脑血管病大鼠microPET-CT评价及其治疗机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

miRNA调控血小板整合素αIIbβ3信号转导和骨架蛋白重构及其在冠心病血瘀证发病中的机制

国家自然科学基金

0+阅读 · 2012年12月31日

人脂肪间充质干细胞定向肝实质细胞分化的研究

国家自然科学基金

0+阅读 · 2011年12月31日

清脉饮及其拆方对动脉粥样硬化形成过程NF－κBmRNA的调控

国家自然科学基金

0+阅读 · 2011年12月31日

纳米晶的疲劳力学行为的研究

国家自然科学基金

0+阅读 · 2008年12月31日

CRS-FL: Conditional Random Sampling for Communication-Efficient and Privacy-Preserving Federated Learning

Arxiv

0+阅读 · 2023年6月7日

Physics Inspired Approaches To Understanding Gaussian Processes

Arxiv

0+阅读 · 2023年6月6日

Functional sufficient dimension reduction through information maximization with application to classification

Arxiv

0+阅读 · 2023年6月6日

A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning

Arxiv

0+阅读 · 2023年6月6日

Integrated Sensing, Computation, and Communication: System Framework and Performance Optimization

Arxiv

0+阅读 · 2023年6月6日

Memory-Based Dual Gaussian Processes for Sequential Learning

Arxiv

0+阅读 · 2023年6月6日

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

Arxiv

0+阅读 · 2023年6月6日

Optimal Resource Allocation with Delay Guarantees for Network Slicing in Disaggregated RAN

Arxiv

0+阅读 · 2023年6月5日

Towards Efficient Controller Synthesis Techniques for Logical LTL Games

Arxiv

0+阅读 · 2023年6月4日

Formalizing Preferences Over Runtime Distributions

Arxiv

0+阅读 · 2023年6月2日

VIP会员

文章信息

相关主题

最新内容

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

6+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

5+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

7+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

7+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

7+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

9+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

8+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

8+阅读 · 6月25日

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

9+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

10+阅读 · 6月24日

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

11+阅读 · 6月24日

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

10+阅读 · 6月24日

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

7+阅读 · 6月24日

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

10+阅读 · 6月24日

相关VIP内容

【PAISS 2021 教程】概率散度与生成式模型，92页ppt

【PAISS 2021 教程】概率散度与生成式模型，92页ppt

专知会员服务

34+阅读 · 2021年11月30日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

网状网络及其在军事领域的运用

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

CRS-FL: Conditional Random Sampling for Communication-Efficient and Privacy-Preserving Federated Learning

Arxiv

0+阅读 · 2023年6月7日

Physics Inspired Approaches To Understanding Gaussian Processes

Arxiv

0+阅读 · 2023年6月6日

Functional sufficient dimension reduction through information maximization with application to classification

Arxiv

0+阅读 · 2023年6月6日

A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning

Arxiv

0+阅读 · 2023年6月6日

Integrated Sensing, Computation, and Communication: System Framework and Performance Optimization

Arxiv

0+阅读 · 2023年6月6日

Memory-Based Dual Gaussian Processes for Sequential Learning

Arxiv

0+阅读 · 2023年6月6日

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

Arxiv

0+阅读 · 2023年6月6日

Optimal Resource Allocation with Delay Guarantees for Network Slicing in Disaggregated RAN

Arxiv

0+阅读 · 2023年6月5日

Towards Efficient Controller Synthesis Techniques for Logical LTL Games

Arxiv

0+阅读 · 2023年6月4日

Formalizing Preferences Over Runtime Distributions

Arxiv

0+阅读 · 2023年6月2日

相关基金

Numb在肾脏细胞自噬中的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

多功能有机多孔骨架纳米材料的制备及其作为药物载体的研究

国家自然科学基金

0+阅读 · 2013年12月31日

BDNF/TrkB途径介导调控骨髓瘤MDSCs破骨分化的作用和机制

国家自然科学基金

0+阅读 · 2013年12月31日

放射状胶质细胞在海马神经再生中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于鞅理论与统计信息的仿真优化

国家自然科学基金

1+阅读 · 2012年12月31日

不同途径移植HUCB-MSCs治疗脑血管病大鼠microPET-CT评价及其治疗机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

miRNA调控血小板整合素αIIbβ3信号转导和骨架蛋白重构及其在冠心病血瘀证发病中的机制

国家自然科学基金

0+阅读 · 2012年12月31日

人脂肪间充质干细胞定向肝实质细胞分化的研究

国家自然科学基金

0+阅读 · 2011年12月31日

清脉饮及其拆方对动脉粥样硬化形成过程NF－κBmRNA的调控

国家自然科学基金

0+阅读 · 2011年12月31日

纳米晶的疲劳力学行为的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员