Adaptive Policy Learning for Offline-to-Online Reinforcement Learning - 专知论文

会员服务 ·

0

Learning · 在线 · 数据集 · 强化学习 · Extensibility ·

2023 年 3 月 14 日

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

翻译：自适应策略学习：面向离线到在线强化学习

Han Zheng,Xufang Luo,Pengfei Wei,Xuan Song,Dongsheng Li,Jing Jiang

from arxiv, AAAI2023

Conventional reinforcement learning (RL) needs an environment to collect fresh data, which is impractical when online interactions are costly. Offline RL provides an alternative solution by directly learning from the previously collected dataset. However, it will yield unsatisfactory performance if the quality of the offline datasets is poor. In this paper, we consider an offline-to-online setting where the agent is first learned from the offline dataset and then trained online, and propose a framework called Adaptive Policy Learning for effectively taking advantage of offline and online data. Specifically, we explicitly consider the difference between the online and offline data and apply an adaptive update scheme accordingly, that is, a pessimistic update strategy for the offline dataset and an optimistic/greedy update scheme for the online dataset. Such a simple and effective method provides a way to mix the offline and online RL and achieve the best of both worlds. We further provide two detailed algorithms for implementing the framework through embedding value or policy-based RL algorithms into it. Finally, we conduct extensive experiments on popular continuous control tasks, and results show that our algorithm can learn the expert policy with high sample efficiency even when the quality of offline dataset is poor, e.g., random dataset.

翻译：传统强化学习需要与环境交互来收集新数据，但在在线交互成本高昂时，这种做法并不实用。离线强化学习通过直接利用先前收集的数据集进行学习，提供了一种替代方案。然而，若离线数据集质量较差，该方法会产生不令人满意的效果。本文考虑一种"离线到在线"设置——智能体首先通过离线数据集学习，随后进行在线训练，并提出一种名为自适应策略学习的框架，以有效利用离线与在线数据。具体而言，我们显式考虑在线与离线数据的差异，并据此应用自适应更新策略：对离线数据集采用悲观更新策略，对在线数据集采用乐观/贪婪更新策略。这种简单有效的方法提供了融合离线与在线强化学习、实现二者优势互补的途径。我们进一步提出两种具体算法，通过将基于价值或基于策略的强化学习算法嵌入该框架来实现。最后，我们在流行的连续控制任务上进行了大量实验，结果表明：即使离线数据集质量较差（例如随机数据集），我们的算法仍能以高样本效率学习专家策略。

0

相关内容

Learning

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

232+阅读 · 2022年2月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

闭环双相磁电材料的尺寸效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

贵金属—钙钛矿催化剂材料中的金属扩散和催化活性

国家自然科学基金

0+阅读 · 2013年12月31日

压电薄膜剪切压电应变特性的研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型窄带隙锑化物二维电子气材料及输运特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

反钙钛矿结构Mn3XN(X=Cu,Ni)薄膜的制备及其磁相变和电输运特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

水滑石基可见光响应型复合光催化剂薄膜的研制及其光催化性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

纳米结构金刚石基板上单晶二氧化钛薄膜的低温制备与特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

SrxBa1-xNb2O6纳米陶瓷与薄膜的电卡效应

国家自然科学基金

0+阅读 · 2012年12月31日

A Survey on Offline Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年5月5日

Federated Ensemble-Directed Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月4日

Rethinking Population-assisted Off-policy Reinforcement Learning

Arxiv

0+阅读 · 2023年5月4日

Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare

Arxiv

0+阅读 · 2023年5月2日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Arxiv

17+阅读 · 2023年1月18日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

最新内容

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

专知会员服务

3+阅读 · 7月23日

《基于强化学习的自动化红队测试》

《基于强化学习的自动化红队测试》

专知会员服务

3+阅读 · 7月23日

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

专知会员服务

4+阅读 · 7月23日

“天降毒雾”：无人机如何使化学战重返乌克兰战场

“天降毒雾”：无人机如何使化学战重返乌克兰战场

专知会员服务

1+阅读 · 7月23日

伊朗不对称防空战略的演进

伊朗不对称防空战略的演进

专知会员服务

3+阅读 · 7月23日

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

10+阅读 · 7月22日

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

4+阅读 · 7月22日

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

8+阅读 · 7月22日

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

10+阅读 · 7月22日

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

15+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

14+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

4+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

6+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

9+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

7+阅读 · 7月20日

相关VIP内容

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

232+阅读 · 2022年2月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于强化学习的自动化红队测试》

“天降毒雾”：无人机如何使化学战重返乌克兰战场

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Survey on Offline Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年5月5日

Federated Ensemble-Directed Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月4日

Rethinking Population-assisted Off-policy Reinforcement Learning

Arxiv

0+阅读 · 2023年5月4日

Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare

Arxiv

0+阅读 · 2023年5月2日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Arxiv

17+阅读 · 2023年1月18日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

闭环双相磁电材料的尺寸效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

贵金属—钙钛矿催化剂材料中的金属扩散和催化活性

国家自然科学基金

0+阅读 · 2013年12月31日

压电薄膜剪切压电应变特性的研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型窄带隙锑化物二维电子气材料及输运特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

反钙钛矿结构Mn3XN(X=Cu,Ni)薄膜的制备及其磁相变和电输运特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

水滑石基可见光响应型复合光催化剂薄膜的研制及其光催化性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

纳米结构金刚石基板上单晶二氧化钛薄膜的低温制备与特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

SrxBa1-xNb2O6纳米陶瓷与薄膜的电卡效应

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员