CLUTR: Curriculum Learning via Unsupervised Task Representation Learning - 专知论文

会员服务 ·

0

Learning · 泛化理论 · 无监督 · 流形 · Agent ·

2023 年 3 月 7 日

CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

翻译：CLUTR：基于无监督任务表征学习的课程学习

Abdus Salam Azad,Izzeddin Gur,Jasper Emhoff,Nathaniel Alexis,Aleksandra Faust,Pieter Abbeel,Ion Stoica

from arxiv, Preprint, Currently Under Review

Reinforcement Learning (RL) algorithms are often known for sample inefficiency and difficult generalization. Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks. This is a non-stationary process where the task distribution evolves along with agent policies; creating an instability over time. While past works demonstrated the potential of such approaches, sampling effectively from the task space remains an open challenge, bottlenecking these approaches. To this end, we introduce CLUTR: a novel unsupervised curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization. It first trains a recurrent variational autoencoder on randomly generated tasks to learn a latent task manifold. Next, a teacher agent creates a curriculum by maximizing a minimax REGRET-based objective on a set of latent tasks sampled from this manifold. Using the fixed-pretrained task manifold, we show that CLUTR successfully overcomes the non-stationarity problem and improves stability. Our experimental results show CLUTR outperforms PAIRED, a principled and popular UED method, in the challenging CarRacing and navigation environments: achieving 10.6X and 45\% improvement in zero-shot generalization, respectively. CLUTR also performs comparably to the non-UED state-of-the-art for CarRacing, while requiring 500X fewer environment interactions.

翻译：强化学习（RL）算法常因样本效率低下和泛化困难而著称。近期，无监督环境设计（UED）通过同时学习任务分布和智能体在生成任务上的策略，成为一种面向零样本泛化的新范式。这是一个非平稳过程，其中任务分布随智能体策略共同演化，导致随时间推移出现不稳定性。尽管以往研究展示了此类方法的潜力，但如何有效采样任务空间仍是制约这些方法的瓶颈挑战。为此，我们提出CLUTR：一种新颖的无监督课程学习算法，将任务表征与课程学习解耦为两阶段优化。该算法首先在随机生成任务上训练循环变分自编码器，以学习隐式任务流形；随后，教师智能体通过从该流形采样的隐式任务集上最大化基于最小最大遗憾值的优化目标来构建课程。通过使用固定预训练的任务流形，我们证明CLUTR成功克服了非平稳性问题并提升了稳定性。实验结果表明，在具有挑战性的CarRacing和导航环境中，CLUTR优于原则性强且流行的UED方法PAIRED：零样本泛化性能分别提升10.6倍和45%。同时，CLUTR在CarRacing任务上达到了与非UED先进方法相当的性能，且所需环境交互次数减少500倍。

0

相关内容

Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

离子注入合成In纳米颗粒在Al薄膜中超导性质的研究

国家自然科学基金

0+阅读 · 2015年12月31日

大鼠浅筋膜中腺苷受体与针刺镇痛相关性研究

国家自然科学基金

0+阅读 · 2014年12月31日

工件可拒绝的折衷排序和在线排序

国家自然科学基金

0+阅读 · 2014年12月31日

LncRNA参与Arc调控海马神经元突触重塑在癫痫发生中的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Rho/ROCK信号通路的双黄连注射液致过敏样反应机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

CDC73基因异常对颌骨骨化纤维瘤发病的作用

国家自然科学基金

0+阅读 · 2011年12月31日

框架中若干问题的研究

国家自然科学基金

0+阅读 · 2011年12月31日

压缩采样框架下的自适应稀疏信号感知与重建

国家自然科学基金

0+阅读 · 2009年12月31日

Proximal Curriculum for Reinforcement Learning Agents

Arxiv

0+阅读 · 2023年4月25日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Deep Generative Models on 3D Representations: A Survey

Arxiv

15+阅读 · 2022年10月27日

Self-Supervised Learning via Maximum Entropy Coding

Arxiv

13+阅读 · 2022年10月20日

Graph Self-Supervised Learning: A Survey

Arxiv

15+阅读 · 2021年8月5日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Curriculum Learning: A Survey

Arxiv

24+阅读 · 2021年1月25日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Deep learning in agriculture: A survey

Arxiv

11+阅读 · 2018年7月31日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

1+阅读 · 今天15:02

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

1+阅读 · 今天15:00

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

2+阅读 · 今天14:30

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

2+阅读 · 今天14:05

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

2+阅读 · 今天13:55

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

2+阅读 · 今天13:51

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

2+阅读 · 今天13:48

美国从乌克兰无人机战争中学习经验

美国从乌克兰无人机战争中学习经验

专知会员服务

7+阅读 · 6月21日

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

专知会员服务

5+阅读 · 6月21日

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

专知会员服务

7+阅读 · 6月21日

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

20+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

5+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

8+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

7+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

9+阅读 · 6月18日

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 3D场景图：开放挑战与未来方向

21世纪的无人机战争

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

相关资讯

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Proximal Curriculum for Reinforcement Learning Agents

Arxiv

0+阅读 · 2023年4月25日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Deep Generative Models on 3D Representations: A Survey

Arxiv

15+阅读 · 2022年10月27日

Self-Supervised Learning via Maximum Entropy Coding

Arxiv

13+阅读 · 2022年10月20日

Graph Self-Supervised Learning: A Survey

Arxiv

15+阅读 · 2021年8月5日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Curriculum Learning: A Survey

Arxiv

24+阅读 · 2021年1月25日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Deep learning in agriculture: A survey

Arxiv

11+阅读 · 2018年7月31日

相关基金

离子注入合成In纳米颗粒在Al薄膜中超导性质的研究

国家自然科学基金

0+阅读 · 2015年12月31日

大鼠浅筋膜中腺苷受体与针刺镇痛相关性研究

国家自然科学基金

0+阅读 · 2014年12月31日

工件可拒绝的折衷排序和在线排序

国家自然科学基金

0+阅读 · 2014年12月31日

LncRNA参与Arc调控海马神经元突触重塑在癫痫发生中的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Rho/ROCK信号通路的双黄连注射液致过敏样反应机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

CDC73基因异常对颌骨骨化纤维瘤发病的作用

国家自然科学基金

0+阅读 · 2011年12月31日

框架中若干问题的研究

国家自然科学基金

0+阅读 · 2011年12月31日

压缩采样框架下的自适应稀疏信号感知与重建

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员