活在瞬间:学习适应不断演变的政策的动态模型 (Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy) - 专知论文

会员服务 ·

0

Learning · MoDELS · Continuity · 矩 · 混合分布 ·

2022 年 7 月 25 日

Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy

翻译：活在瞬间:学习适应不断演变的政策的动态模型

Xiyao Wang,Wichayaporn Wongkamjan,Furong Huang

from arxiv, 16 pages, 5 figures

Model-based reinforcement learning (RL) achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning. Previous works learn a "global" dynamics model to fit the state-action visitation distribution for all historical policies. However, in this paper, we find that learning a global dynamics model does not necessarily benefit model prediction for the current policy since the policy in use is constantly evolving. The evolving policy during training will cause state-action visitation distribution shifts. We theoretically analyze how the distribution of historical policies affects the model learning and model rollouts. We then propose a novel model-based RL method, named \textit{Policy-adaptation Model-based Actor-Critic (PMAC)}, which learns a policy-adapted dynamics model based on a policy-adaptation mechanism. This mechanism dynamically adjusts the historical policy mixture distribution to ensure the learned model can continually adapt to the state-action visitation distribution of the evolving policy. Experiments on a range of continuous control environments in MuJoCo show that PMAC achieves state-of-the-art asymptotic performance and almost two times higher sample efficiency than prior model-based methods.

翻译：以模型为基础的强化学习(RL)在实践上比没有模型的RL(RL)在实践上取得更高的抽样效率,方法是学习一种动态模型,以生成政策学习样本。以前的作品学习一种“全球”动态模型,以适应所有历史政策的国家行动访问分布。然而,在本文件中,我们发现,学习一种全球动态模型并不一定有利于当前政策的预测模式,因为正在使用的政策正在不断演变。培训过程中不断演变的政策将导致州-行动访问分布的变化。我们从理论上分析历史政策的分布如何影响模型学习和模型推出。我们然后提出一种新的基于模型的RL方法,名为\textit{政策适应模型的Ander-Crictic(PMAC)},它学习一种基于政策适应机制的政策适应动态模型模型。这一机制动态调整了历史政策混合分布,以确保学习的模型能够持续适应正在演变的政策的州-行动访问分布。在MuJoco进行的一系列持续控制环境实验表明,PMAC(PMAC)实现了基于州模型的效率,而不是以往两次。

0

相关内容

Learning

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

Progerin/PrelaminA诱发早老症的蛋白质组学研究

国家自然科学基金

1+阅读 · 2015年12月31日

内生真菌Shiraia sp.Slf14竹红菌素生物合成途径及代谢调控

国家自然科学基金

0+阅读 · 2014年12月31日

脂肪细胞因子家族基因多态性与动脉粥样硬化性脑梗死的相关性研究

国家自然科学基金

0+阅读 · 2013年12月31日

泛素蛋白酶体通路基因SNPs与晚期食管鳞癌紫杉醇敏感性

国家自然科学基金

0+阅读 · 2012年12月31日

新型抗生素Bagremycins生物合成基因簇的鉴定与解析

国家自然科学基金

0+阅读 · 2012年12月31日

新疆不同民族抗结核药物所致肝损害易感性与其相关代谢酶基因多态性关系的研究

国家自然科学基金

0+阅读 · 2009年12月31日

寻找多氯联苯代谢途径中缺失的一环

国家自然科学基金

0+阅读 · 2009年12月31日

膀胱癌DNA修复基因XPC高甲基化导致基因沉默的作用与机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

黄孢原毛平革菌寡肽转运蛋白基因家族研究

国家自然科学基金

0+阅读 · 2008年12月31日

Meta-Reinforcement Learning for the Tuning of PI Controllers: An Offline Approach

Arxiv

1+阅读 · 2022年9月19日

Damage Identification in Fiber Metal Laminates using Bayesian Analysis with Model Order Reduction

Damage Identification in Fiber Metal Laminates using Bayesian Analysis with Model Order Reduction

Arxiv

0+阅读 · 2022年9月19日

Adaptive Multi-stage Density Ratio Estimation for Learning Latent Space Energy-based Model

Arxiv

0+阅读 · 2022年9月19日

ShareTrace: Contact Tracing with the Actor Model

Arxiv

0+阅读 · 2022年9月19日

Towards Robust Off-Policy Evaluation via Human Inputs

Arxiv

0+阅读 · 2022年9月18日

Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective

Arxiv

0+阅读 · 2022年9月18日

Adaptive Natural Language Generation for Task-oriented Dialogue via Reinforcement Learning

Arxiv

0+阅读 · 2022年9月16日

Efficient learning of nonlinear prediction models with time-series privileged information

Arxiv

0+阅读 · 2022年9月16日

Understanding Robust Learning through the Lens of Representation Similarities

Arxiv

0+阅读 · 2022年9月15日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

VIP会员

文章信息

相关主题

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

智能体记忆深度剖析：评价指标与系统局限性的分类体系及实证分析

《可信人工智能赋能系统的支柱》

【CMU博士论文】可靠轨迹预测的分层基石：数据、评估与方法

人工智能赋能边缘与自主系统：美陆军现代化进程聚焦威胁探测与战术边缘情报

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Meta-Reinforcement Learning for the Tuning of PI Controllers: An Offline Approach

Arxiv

1+阅读 · 2022年9月19日

Damage Identification in Fiber Metal Laminates using Bayesian Analysis with Model Order Reduction

Damage Identification in Fiber Metal Laminates using Bayesian Analysis with Model Order Reduction

Arxiv

0+阅读 · 2022年9月19日

Adaptive Multi-stage Density Ratio Estimation for Learning Latent Space Energy-based Model

Arxiv

0+阅读 · 2022年9月19日

ShareTrace: Contact Tracing with the Actor Model

Arxiv

0+阅读 · 2022年9月19日

Towards Robust Off-Policy Evaluation via Human Inputs

Arxiv

0+阅读 · 2022年9月18日

Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective

Arxiv

0+阅读 · 2022年9月18日

Adaptive Natural Language Generation for Task-oriented Dialogue via Reinforcement Learning

Arxiv

0+阅读 · 2022年9月16日

Efficient learning of nonlinear prediction models with time-series privileged information

Arxiv

0+阅读 · 2022年9月16日

Understanding Robust Learning through the Lens of Representation Similarities

Arxiv

0+阅读 · 2022年9月15日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

Progerin/PrelaminA诱发早老症的蛋白质组学研究

国家自然科学基金

1+阅读 · 2015年12月31日

内生真菌Shiraia sp.Slf14竹红菌素生物合成途径及代谢调控

国家自然科学基金

0+阅读 · 2014年12月31日

脂肪细胞因子家族基因多态性与动脉粥样硬化性脑梗死的相关性研究

国家自然科学基金

0+阅读 · 2013年12月31日

泛素蛋白酶体通路基因SNPs与晚期食管鳞癌紫杉醇敏感性

国家自然科学基金

0+阅读 · 2012年12月31日

新型抗生素Bagremycins生物合成基因簇的鉴定与解析

国家自然科学基金

0+阅读 · 2012年12月31日

新疆不同民族抗结核药物所致肝损害易感性与其相关代谢酶基因多态性关系的研究

国家自然科学基金

0+阅读 · 2009年12月31日

寻找多氯联苯代谢途径中缺失的一环

国家自然科学基金

0+阅读 · 2009年12月31日

膀胱癌DNA修复基因XPC高甲基化导致基因沉默的作用与机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

黄孢原毛平革菌寡肽转运蛋白基因家族研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员