Meta reinforcement learning (Meta RL) has been amply explored to quickly learn an unseen task by transferring previously learned knowledge from similar tasks. However, most state-of-the-art algorithms require the meta-training tasks to have a dense coverage on the task distribution and a great amount of data for each of them. In this paper, we propose MetaDreamer, a context-based Meta RL algorithm that requires less real training tasks and data by doing meta-imagination and MDP-imagination. We perform meta-imagination by interpolating on the learned latent context space with disentangled properties, as well as MDP-imagination through the generative world model where physical knowledge is added to plain VAE networks. Our experiments with various benchmarks show that MetaDreamer outperforms existing approaches in data efficiency and interpolated generalization.
翻译:元强化学习(Meta RL)已被广泛研究,旨在通过迁移先前从相似任务中习得的知识,快速学习未知任务。然而,当前多数先进算法要求元训练任务对任务分布实现密集覆盖,且每个任务需具备大量数据。本文提出MetaDreamer——一种基于上下文的元强化学习算法,通过元想象与MDP想象,减少对真实训练任务与数据的需求。我们通过在学习到的具有解耦特性的潜在上下文空间中进行插值来实现元想象;同时,通过生成式世界模型(在普通VAE网络中融入物理知识)实现MDP想象。在多个基准测试中的实验表明,MetaDreamer在数据效率与插值泛化方面均优于现有方法。