MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL

Recently, diffusion model shines as a promising backbone for the sequence modeling paradigm in offline reinforcement learning(RL). However, these works mostly lack the generalization ability across tasks with reward or dynamics change. To tackle this challenge, in this paper we propose a task-oriented conditioned diffusion planner for offline meta-RL(MetaDiffuser), which considers the generalization problem as conditional trajectory generation task with contextual representation. The key is to learn a context conditioned diffusion model which can generate task-oriented trajectories for planning across diverse tasks. To enhance the dynamics consistency of the generated trajectories while encouraging trajectories to achieve high returns, we further design a dual-guided module in the sampling process of the diffusion model. The proposed framework enjoys the robustness to the quality of collected warm-start data from the testing task and the flexibility to incorporate with different task representation method. The experiment results on MuJoCo benchmarks show that MetaDiffuser outperforms other strong offline meta-RL baselines, demonstrating the outstanding conditional generation ability of diffusion architecture.

翻译：最近，扩散模型在离线强化学习（RL）的序列建模范式中展现出作为有前景骨干架构的潜力。然而，这些研究大多缺乏跨任务（奖励或动力学变化）的泛化能力。为应对这一挑战，本文提出一种面向任务的条件化扩散规划器用于离线元强化学习（MetaDiffuser），该模型将泛化问题视为带有上下文表征的条件轨迹生成任务。其关键在于学习一个上下文条件化扩散模型，能够为跨多样化任务的规划生成面向任务的轨迹。为增强生成轨迹的动力学一致性并鼓励轨迹获得高回报，我们进一步在扩散模型的采样过程中设计了一个双引导模块。该框架对测试任务中收集的热启动数据质量具有鲁棒性，并能灵活地与不同任务表征方法相结合。在MuJoCo基准测试上的实验结果表明，MetaDiffuser优于其他强基线离线元强化学习方法，彰显了扩散架构卓越的条件生成能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日