Temporal Self-Imitation Learning - 专知论文

会员服务 ·

0

Learning · 塑造 · FAST · 监督 · 稳健性 ·

Temporal Self-Imitation Learning

翻译：暂无翻译

Yinsen Jia,Boyuan Chen

Long-horizon robot manipulation policies trained with reward shaping can still exploit dense rewards through inefficient interaction, while rare efficient behaviors may be forgotten during training. We argue that temporal efficiency itself provides a powerful and underutilized source of self-supervision for reinforcement learning. We introduce Temporal Self-Imitation Learning (TSIL), a reinforcement learning framework that mines temporally efficient successful trajectories generated during learning and converts them into reusable supervision for future policy improvement. TSIL progressively refines learning using configuration-conditioned adaptive temporal targets derived from fast successful trajectories, while preserving and replaying efficient behaviors through efficiency-weighted self-imitation learning. Across 15 distinct long-horizon manipulation tasks, TSIL consistently improves learning efficiency, task-completion efficiency, revisitation of fast successful behaviors, and robustness to unstable training conditions. More broadly, our results suggest that the temporal structure of successful behavior itself provides a scalable self-supervisory signal for reinforcement learning beyond manually engineered reward shaping alone.

翻译：暂无翻译

0

相关内容

Learning

【伯克利博士论文】物理世界中可泛化且可扩展的机器人学习

【伯克利博士论文】物理世界中可泛化且可扩展的机器人学习

专知会员服务

22+阅读 · 1月18日

Nature：大脑中的多时间尺度强化学习

Nature：大脑中的多时间尺度强化学习

专知会员服务

18+阅读 · 2025年6月8日

【ICML2021】密度约束强化学习

专知会员服务

22+阅读 · 2021年6月26日

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

专知会员服务

67+阅读 · 2020年8月22日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

47+阅读 · 2020年1月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【ICML2019 tutorial】主动学习:从理论到实践（Active Learning: From Theory to Practice），Robert Nowak，Steve Hanneke

【ICML2019 tutorial】主动学习:从理论到实践（Active Learning: From Theory to Practice），Robert Nowak，Steve Hanneke

专知会员服务

48+阅读 · 2019年6月10日

【ICML2019 tutorial】永无止境的学习（Never-Ending Learning），Tom M. Mitchell，Partha Talukdar

【ICML2019 tutorial】永无止境的学习（Never-Ending Learning），Tom M. Mitchell，Partha Talukdar

专知会员服务

21+阅读 · 2019年6月10日

解读自监督学习(Self-Supervised Learning)几篇相关paper

解读自监督学习(Self-Supervised Learning)几篇相关paper

CVer

25+阅读 · 2020年2月21日

【赠书】TensorFlow自然语言处理

【赠书】TensorFlow自然语言处理

AINLP

17+阅读 · 2019年7月14日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

泡泡机器人SLAM

11+阅读 · 2018年10月6日

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

深度学习与NLP

15+阅读 · 2018年6月20日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

融合人脑意图与力觉反馈的外骨骼机器人步态控制CPG模型及调节方法

国家自然科学基金

0+阅读 · 2015年12月31日

基于四维航迹运行的航路网络飞行安全间隔保持理论与方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于记忆的不变图像特征学习方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

在轨航天器诊断策略自动构建与学习调控方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

机器灵巧手基于触滑觉信息协同的自适应力控制方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

可与MPSoC高度融合的片上自主测试-自主修复关键技术研究：针对自然、人为可靠性威胁

国家自然科学基金

0+阅读 · 2015年12月31日

基于确定学习方法的无人水面艇智能控制研究

国家自然科学基金

17+阅读 · 2014年12月31日

基于深度学习的特征融合在移动机器人视觉中的场景理解及研究

国家自然科学基金

12+阅读 · 2014年12月31日

基于逆向强化学习和人工智能的移动机器人自主学习方法研究

国家自然科学基金

12+阅读 · 2013年12月31日

面向人与Agent混合的多团队协作仿真训练方法研究

国家自然科学基金

19+阅读 · 2012年12月31日

SC3-Eval: Evaluating Robot Foundation Models via Self-Consistent Video Generation

Arxiv

0+阅读 · 6月17日

AgenticRL: Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation

Arxiv

0+阅读 · 6月9日

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

Arxiv

0+阅读 · 6月4日

Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation

Arxiv

0+阅读 · 6月2日

Driving Intents Amplify Planning-Oriented Reinforcement Learning

Arxiv

0+阅读 · 5月14日

Goal-Conditioned Decision Transformer for Multi-Goal Offline Reinforcement Learning

Arxiv

0+阅读 · 5月8日

Imitation Learning: Progress, Taxonomies and Opportunities

Arxiv

12+阅读 · 2021年6月23日

Model-Contrastive Federated Learning

Arxiv

10+阅读 · 2021年3月30日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

Self-supervised Learning: Generative or Contrastive

Arxiv

19+阅读 · 2020年7月21日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

4+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

5+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

11+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

9+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

6+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

8+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

7+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

11+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

7+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

5+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

7+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

8+阅读 · 6月17日

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

7+阅读 · 6月17日

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

专知会员服务

6+阅读 · 6月17日

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

7+阅读 · 6月16日

相关VIP内容

【伯克利博士论文】物理世界中可泛化且可扩展的机器人学习

【伯克利博士论文】物理世界中可泛化且可扩展的机器人学习

专知会员服务

22+阅读 · 1月18日

Nature：大脑中的多时间尺度强化学习

Nature：大脑中的多时间尺度强化学习

专知会员服务

18+阅读 · 2025年6月8日

【ICML2021】密度约束强化学习

专知会员服务

22+阅读 · 2021年6月26日

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

专知会员服务

67+阅读 · 2020年8月22日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

47+阅读 · 2020年1月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【ICML2019 tutorial】主动学习:从理论到实践（Active Learning: From Theory to Practice），Robert Nowak，Steve Hanneke

【ICML2019 tutorial】主动学习:从理论到实践（Active Learning: From Theory to Practice），Robert Nowak，Steve Hanneke

专知会员服务

48+阅读 · 2019年6月10日

【ICML2019 tutorial】永无止境的学习（Never-Ending Learning），Tom M. Mitchell，Partha Talukdar

【ICML2019 tutorial】永无止境的学习（Never-Ending Learning），Tom M. Mitchell，Partha Talukdar

专知会员服务

21+阅读 · 2019年6月10日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

相关资讯

解读自监督学习(Self-Supervised Learning)几篇相关paper

解读自监督学习(Self-Supervised Learning)几篇相关paper

CVer

25+阅读 · 2020年2月21日

【赠书】TensorFlow自然语言处理

【赠书】TensorFlow自然语言处理

AINLP

17+阅读 · 2019年7月14日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

泡泡机器人SLAM

11+阅读 · 2018年10月6日

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

深度学习与NLP

15+阅读 · 2018年6月20日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

相关论文

SC3-Eval: Evaluating Robot Foundation Models via Self-Consistent Video Generation

Arxiv

0+阅读 · 6月17日

AgenticRL: Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation

Arxiv

0+阅读 · 6月9日

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

Arxiv

0+阅读 · 6月4日

Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation

Arxiv

0+阅读 · 6月2日

Driving Intents Amplify Planning-Oriented Reinforcement Learning

Arxiv

0+阅读 · 5月14日

Goal-Conditioned Decision Transformer for Multi-Goal Offline Reinforcement Learning

Arxiv

0+阅读 · 5月8日

Imitation Learning: Progress, Taxonomies and Opportunities

Arxiv

12+阅读 · 2021年6月23日

Model-Contrastive Federated Learning

Arxiv

10+阅读 · 2021年3月30日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

Self-supervised Learning: Generative or Contrastive

Arxiv

19+阅读 · 2020年7月21日

相关基金

融合人脑意图与力觉反馈的外骨骼机器人步态控制CPG模型及调节方法

国家自然科学基金

0+阅读 · 2015年12月31日

基于四维航迹运行的航路网络飞行安全间隔保持理论与方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于记忆的不变图像特征学习方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

在轨航天器诊断策略自动构建与学习调控方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

机器灵巧手基于触滑觉信息协同的自适应力控制方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

可与MPSoC高度融合的片上自主测试-自主修复关键技术研究：针对自然、人为可靠性威胁

国家自然科学基金

0+阅读 · 2015年12月31日

基于确定学习方法的无人水面艇智能控制研究

国家自然科学基金

17+阅读 · 2014年12月31日

基于深度学习的特征融合在移动机器人视觉中的场景理解及研究

国家自然科学基金

12+阅读 · 2014年12月31日

基于逆向强化学习和人工智能的移动机器人自主学习方法研究

国家自然科学基金

12+阅读 · 2013年12月31日

面向人与Agent混合的多团队协作仿真训练方法研究

国家自然科学基金

19+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员