Hindsight Foresight Relabeling for Meta-Reinforcement Learning - 专知论文

会员服务 ·

0

学成 · Performer · 奖励函数 · 情景 · Performance ·

2021 年 9 月 18 日

Hindsight Foresight Relabeling for Meta-Reinforcement Learning

翻译：Meta-加强学习的超视前视展望重新标签

Michael Wan,Jian Peng,Tanmay Gangwani

Meta-reinforcement learning (meta-RL) algorithms allow for agents to learn new behaviors from small amounts of experience, mitigating the sample inefficiency problem in RL. However, while meta-RL agents can adapt quickly to new tasks at test time after experiencing only a few trajectories, the meta-training process is still sample-inefficient. Prior works have found that in the multi-task RL setting, relabeling past transitions and thus sharing experience among tasks can improve sample efficiency and asymptotic performance. We apply this idea to the meta-RL setting and devise a new relabeling method called Hindsight Foresight Relabeling (HFR). We construct a relabeling distribution using the combination of "hindsight", which is used to relabel trajectories using reward functions from the training task distribution, and "foresight", which takes the relabeled trajectories and computes the utility of each trajectory for each task. HFR is easy to implement and readily compatible with existing meta-RL algorithms. We find that HFR improves performance when compared to other relabeling methods on a variety of meta-RL tasks.

翻译：元强化学习( meta- RL) 算法允许代理商从少量经验中学习新的行为, 减轻RL 中的抽样低效率问题。然而, 元RL 代理商在只经历几条轨迹之后, 可以在测试时间快速适应新的任务。但是, 元RL 代理商在只经历几条轨迹之后, 元培训过程仍然缺乏样本效率。先前的工程发现, 在多任务RL 设置中, 重新标签过去的过渡, 从而在任务之间分享经验, 可以提高样本效率和效果。我们把这个想法应用到元RL 设置中, 并设计出一种新的重新标签方法, 叫做 Hindsight 前景重标签。我们用“ 黑洞” 组合来构建一个重新标签分布, 用于使用培训任务分布的奖励功能重新标签轨迹, 以及“ 展望” 组合, 将重新标签的轨迹标为每条轨迹, 并计算每项任务的轨迹的实用性。我们发现, HFRRR很容易实施, 并且很容易与现有的元RL 矩阵算法比较时, 我们发现 HFRFRL 改进了其他任务。

1

相关内容

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

39+阅读 · 2020年5月30日

【牛津大学】深度学习时间序列预测，12页pdf, Deep Learning Time Series Forecasting

【牛津大学】深度学习时间序列预测，12页pdf, Deep Learning Time Series Forecasting

专知会员服务

174+阅读 · 2020年5月1日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

254+阅读 · 2020年4月19日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

85+阅读 · 2020年2月18日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Andrew NG的新书《Machine Learning Yearning》

Andrew NG的新书《Machine Learning Yearning》

我爱机器学习

11+阅读 · 2016年12月7日

Scalable Reinforcement Learning Policies for Multi-Agent Control

Arxiv

0+阅读 · 2021年11月10日

Information Directed Reward Learning for Reinforcement Learning

Arxiv

1+阅读 · 2021年11月10日

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay

Arxiv

0+阅读 · 2021年11月9日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Energy-Based Hindsight Experience Prioritization

Arxiv

3+阅读 · 2018年10月8日

Multi-task Deep Reinforcement Learning with PopArt

Multi-task Deep Reinforcement Learning with PopArt

Arxiv

4+阅读 · 2018年9月12日

Relational Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年6月5日

VIP会员

文章信息

相关主题

最新内容

无人机数据战

无人机数据战

专知会员服务

6+阅读 · 6月28日

美军研究运用“奥林匹斯山”问题解决模型以制胜灰色地带行动

美军研究运用“奥林匹斯山”问题解决模型以制胜灰色地带行动

专知会员服务

4+阅读 · 6月28日

无人机非战争未来——实为亟待破解之困局

无人机非战争未来——实为亟待破解之困局

专知会员服务

3+阅读 · 6月28日

《语音控制无人机导航：打通自然语言与无人机航行间的技术壁垒》60页

《语音控制无人机导航：打通自然语言与无人机航行间的技术壁垒》60页

专知会员服务

6+阅读 · 6月28日

“信号打击”：美军提升电磁战班组级目标引导能力

“信号打击”：美军提升电磁战班组级目标引导能力

专知会员服务

4+阅读 · 6月28日

《军用与执法部门室内射击模拟训练场综论：技术、架构、人工智能及新兴趋势》

《军用与执法部门室内射击模拟训练场综论：技术、架构、人工智能及新兴趋势》

专知会员服务

5+阅读 · 6月28日

《从卫星通信互联到风险感知型超视距自主行动：新兴无人机架构中的通信、控制与安全》

《从卫星通信互联到风险感知型超视距自主行动：新兴无人机架构中的通信、控制与安全》

专知会员服务

5+阅读 · 6月28日

综述 | 状态空间模型遇见遥感：SSM/Mamba如何重塑遥感视觉？

综述 | 状态空间模型遇见遥感：SSM/Mamba如何重塑遥感视觉？

专知会员服务

6+阅读 · 6月28日

ICML 2026 | 当大模型开始发明自己的语言：如何让 LLM 用更少 Token 完成高强度推理

ICML 2026 | 当大模型开始发明自己的语言：如何让 LLM 用更少 Token 完成高强度推理

专知会员服务

7+阅读 · 6月28日

五角大楼启动“智能体网络”以推进人工智能赋能的战斗管理与目标打击

五角大楼启动“智能体网络”以推进人工智能赋能的战斗管理与目标打击

专知会员服务

15+阅读 · 6月27日

2025年全球二十起重大无人机作战事件

2025年全球二十起重大无人机作战事件

专知会员服务

6+阅读 · 6月27日

现代战争的隐蔽系统：伊朗战争十大启示

现代战争的隐蔽系统：伊朗战争十大启示

专知会员服务

7+阅读 · 6月27日

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

专知会员服务

7+阅读 · 6月26日

GNN跨域综述：从消息传递到图基础模型

GNN跨域综述：从消息传递到图基础模型

专知会员服务

12+阅读 · 6月26日

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

18+阅读 · 6月26日

相关VIP内容

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

39+阅读 · 2020年5月30日

【牛津大学】深度学习时间序列预测，12页pdf, Deep Learning Time Series Forecasting

【牛津大学】深度学习时间序列预测，12页pdf, Deep Learning Time Series Forecasting

专知会员服务

174+阅读 · 2020年5月1日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

254+阅读 · 2020年4月19日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

85+阅读 · 2020年2月18日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

美军研究运用“奥林匹斯山”问题解决模型以制胜灰色地带行动

《语音控制无人机导航：打通自然语言与无人机航行间的技术壁垒》60页

无人机数据战

无人机非战争未来——实为亟待破解之困局

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Andrew NG的新书《Machine Learning Yearning》

Andrew NG的新书《Machine Learning Yearning》

我爱机器学习

11+阅读 · 2016年12月7日

相关论文

Scalable Reinforcement Learning Policies for Multi-Agent Control

Arxiv

0+阅读 · 2021年11月10日

Information Directed Reward Learning for Reinforcement Learning

Arxiv

1+阅读 · 2021年11月10日

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay

Arxiv

0+阅读 · 2021年11月9日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Energy-Based Hindsight Experience Prioritization

Arxiv

3+阅读 · 2018年10月8日

Multi-task Deep Reinforcement Learning with PopArt

Multi-task Deep Reinforcement Learning with PopArt

Arxiv

4+阅读 · 2018年9月12日

Relational Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年6月5日

微信扫码咨询专知VIP会员