Offline Reinforcement Learning with Additional Covering Distributions - 专知论文

会员服务 ·

0

Learning · 泛函 · 优化器 · 覆盖 · 数据集 ·

2023 年 5 月 22 日

Offline Reinforcement Learning with Additional Covering Distributions

翻译：基于额外覆盖分布的离线强化学习

We study learning optimal policies from a logged dataset, i.e., offline RL, with function approximation. Despite the efforts devoted, existing algorithms with theoretic finite-sample guarantees typically assume exploratory data coverage or strong realizable function classes, which is hard to be satisfied in reality. While there are recent works that successfully tackle these strong assumptions, they either require the gap assumptions that only could be satisfied by part of MDPs or use the behavior regularization that makes the optimality of learned policy even intractable. To solve this challenge, we provide finite-sample guarantees for a simple algorithm based on marginalized importance sampling (MIS), showing that sample-efficient offline RL for general MDPs is possible with only a partial coverage dataset and weak realizable function classes given additional side information of a covering distribution. Furthermore, we demonstrate that the covering distribution trades off prior knowledge of the optimal trajectories against the coverage requirement of the dataset, revealing the effect of this inductive bias in the learning processes.

翻译：我们研究从记录数据集（即离线强化学习）中学习最优策略的问题，并采用函数逼近方法。尽管已有诸多努力，现有具备理论有限样本保证的算法通常假设数据具有探索性覆盖或强可实现函数类，这在现实中难以满足。尽管近期有工作成功解决了这些强假设问题，但它们要么需求仅部分马尔可夫决策过程可满足的间隔假设，要么采用行为正则化方法，导致所学策略的最优性甚至难以处理。为应对这一挑战，我们基于边际重要性采样提出一种简单算法并给出其有限样本保证，证明在仅具备部分覆盖数据集和弱可实现函数类的情况下，若给定覆盖分布的额外辅助信息，即可实现一般马尔可夫决策过程的样本高效离线强化学习。此外，我们证明覆盖分布能在最优轨迹的先验知识与数据集覆盖需求之间进行权衡，揭示这种归纳偏好在学习过程中的影响。

0

相关内容

Learning

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

共轭聚合物单分子能量转移的量子相干效应研究

国家自然科学基金

0+阅读 · 2015年12月31日

严酷海洋大气环境中冷轧板在非稳态薄液膜下的腐蚀行为与机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

miRNAs介导CD147基因3’-UTR多态性对缺血性脑卒中的关联研究

国家自然科学基金

0+阅读 · 2013年12月31日

巨噬细胞盐皮质激素受体对动脉粥样硬化的调控作用及其分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

Heregulin-α结合ErbB2/ErbB3异二聚体受体后在乳腺增生发生过程中发挥作用的机制

国家自然科学基金

0+阅读 · 2013年12月31日

复杂时空社会网络的演化、建模及动力学研究

国家自然科学基金

0+阅读 · 2012年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

甲状腺激素受体辅助蛋白150对斑马鱼骨骼肌发育的影响

国家自然科学基金

0+阅读 · 2009年12月31日

复形范畴中的Gorenstein同调维数

国家自然科学基金

0+阅读 · 2009年12月31日

TAP基因阻遏炎性细胞因子信号通路促前列腺癌的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

Offline Reinforcement Learning with Imbalanced Datasets

Arxiv

0+阅读 · 2023年7月6日

The Curse of Passive Data Collection in Batch Reinforcement Learning

Arxiv

0+阅读 · 2023年7月5日

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

Arxiv

0+阅读 · 2023年7月5日

Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning

Arxiv

0+阅读 · 2023年7月4日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Arxiv

11+阅读 · 2022年12月1日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

VIP会员

文章信息

相关主题

最新内容

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

专知会员服务

0+阅读 · 今天14:48

博士论文 | 从算法到基础模型：强化学习的统一视角

博士论文 | 从算法到基础模型：强化学习的统一视角

专知会员服务

0+阅读 · 今天14:46

面向国防作战的最佳自主与蜂群无人机技术

面向国防作战的最佳自主与蜂群无人机技术

专知会员服务

4+阅读 · 今天8:04

《异构人类团队的协作决策过程混合建模研究》

《异构人类团队的协作决策过程混合建模研究》

专知会员服务

4+阅读 · 今天7:59

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

专知会员服务

4+阅读 · 今天7:56

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

专知会员服务

4+阅读 · 今天7:50

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

4+阅读 · 7月27日

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

6+阅读 · 7月27日

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

13+阅读 · 7月27日

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

7+阅读 · 7月27日

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

专知会员服务

7+阅读 · 7月27日

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

专知会员服务

5+阅读 · 7月27日

《防空交战流程的概率建模研究》

《防空交战流程的概率建模研究》

专知会员服务

11+阅读 · 7月27日

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

7+阅读 · 7月26日

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

10+阅读 · 7月26日

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

博士论文 | 从算法到基础模型：强化学习的统一视角

《异构人类团队的协作决策过程混合建模研究》

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

面向国防作战的最佳自主与蜂群无人机技术

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

相关论文

Offline Reinforcement Learning with Imbalanced Datasets

Arxiv

0+阅读 · 2023年7月6日

The Curse of Passive Data Collection in Batch Reinforcement Learning

Arxiv

0+阅读 · 2023年7月5日

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

Arxiv

0+阅读 · 2023年7月5日

Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning

Arxiv

0+阅读 · 2023年7月4日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Arxiv

11+阅读 · 2022年12月1日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

相关基金

共轭聚合物单分子能量转移的量子相干效应研究

国家自然科学基金

0+阅读 · 2015年12月31日

严酷海洋大气环境中冷轧板在非稳态薄液膜下的腐蚀行为与机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

miRNAs介导CD147基因3’-UTR多态性对缺血性脑卒中的关联研究

国家自然科学基金

0+阅读 · 2013年12月31日

巨噬细胞盐皮质激素受体对动脉粥样硬化的调控作用及其分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

Heregulin-α结合ErbB2/ErbB3异二聚体受体后在乳腺增生发生过程中发挥作用的机制

国家自然科学基金

0+阅读 · 2013年12月31日

复杂时空社会网络的演化、建模及动力学研究

国家自然科学基金

0+阅读 · 2012年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

甲状腺激素受体辅助蛋白150对斑马鱼骨骼肌发育的影响

国家自然科学基金

0+阅读 · 2009年12月31日

复形范畴中的Gorenstein同调维数

国家自然科学基金

0+阅读 · 2009年12月31日

TAP基因阻遏炎性细胞因子信号通路促前列腺癌的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员