Reinforcement Learning (RL) algorithms can solve challenging control problems directly from image observations, but they often require millions of environment interactions to do so. Recently, model-based RL algorithms have greatly improved sample-efficiency by concurrently learning an internal model of the world, and supplementing real environment interactions with imagined rollouts for policy improvement. However, learning an effective model of the world from scratch is challenging, and in stark contrast to humans that rely heavily on world understanding and visual cues for learning new skills. In this work, we investigate whether internal models learned by modern model-based RL algorithms can be leveraged to solve new, distinctly different tasks faster. We propose Model-Based Cross-Task Transfer (XTRA), a framework for sample-efficient online RL with scalable pretraining and finetuning of learned world models. By offline multi-task pretraining and online cross-task finetuning, we achieve substantial improvements over a baseline trained from scratch; we improve mean performance of model-based algorithm EfficientZero by 23%, and by as much as 71% in some instances.
翻译:强化学习(RL)算法可直接从图像观测解决具有挑战性的控制问题,但通常需要数百万次环境交互。近年来,基于模型的RL算法通过同时学习世界内部模型,并用想象轨迹补充真实环境交互以改进策略,大幅提升了样本效率。然而,从零开始学习有效的世界模型极具挑战性,这与人类依赖世界理解和视觉线索学习新技能的方式形成鲜明对比。本研究探究了现代基于模型的RL算法所学得的内部模型是否可用于更快速地解决全新且截然不同的任务。我们提出基于模型的跨任务迁移(XTRA)框架,该框架通过可扩展的世界模型预训练与微调实现样本高效的在线RL。通过离线多任务预训练与在线跨任务微调,我们相较于从零训练的基线模型取得了显著改进:将基于模型的算法EfficientZero的平均性能提升了23%,在部分案例中甚至高达71%。