State of the art reinforcement learning has enabled training agents on tasks of ever increasing complexity. However, the current paradigm tends to favor training agents from scratch on every new task or on collections of tasks with a view towards generalizing to novel task configurations. The former suffers from poor data efficiency while the latter is difficult when test tasks are out-of-distribution. Agents that can effectively transfer their knowledge about the world pose a potential solution to these issues. In this paper, we investigate transfer learning in the context of model-based agents. Specifically, we aim to understand when exactly environment models have an advantage and why. We find that a model-based approach outperforms controlled model-free baselines for transfer learning. Through ablations, we show that both the policy and dynamics model learnt through exploration matter for successful transfer. We demonstrate our results across three domains which vary in their requirements for transfer: in-distribution procedural (Crafter), in-distribution identical (RoboDesk), and out-of-distribution (Meta-World). Our results show that intrinsic exploration combined with environment models present a viable direction towards agents that are self-supervised and able to generalize to novel reward functions.
翻译:最先进的强化学习已能够训练智能体处理日益复杂的任务。然而,当前范式倾向于从零开始训练每个新任务或任务集合,以期泛化至新任务配置。前者存在数据效率低下的问题,后者则在测试任务分布外时难以奏效。能够有效迁移世界知识的智能体有望解决上述问题。本文在基于模型的智能体框架下研究迁移学习,旨在揭示环境模型的优势所在及其根本原因。实验表明,在迁移学习中,基于模型的方法显著优于受控的无模型基线方法。通过消融实验,我们证明迁移成功的关键在于探索过程中习得的策略模型与动力学模型。我们在三个迁移需求各异的领域验证了结果:分布内过程生成(Crafter)、分布内同构(RoboDesk)及分布外(Meta-World)场景。研究结果表明,内在探索与环境模型的结合为构建自监督且能泛化至新奖励函数的智能体提供了可行方向。