World models power some of the most efficient reinforcement learning algorithms. In this work, we showcase that they can be harnessed for continual learning - a situation when the agent faces changing environments. World models typically employ a replay buffer for training, which can be naturally extended to continual learning. We systematically study how different selective experience replay methods affect performance, forgetting, and transfer. We also provide recommendations regarding various modeling options for using world models. The best set of choices is called Continual-Dreamer, it is task-agnostic and utilizes the world model for continual exploration. Continual-Dreamer is sample efficient and outperforms state-of-the-art task-agnostic continual reinforcement learning methods on Minigrid and Minihack benchmarks.
翻译:世界模型驱动着当前最高效的强化学习算法。本研究展示了此类模型可被应用于持续学习场景——即智能体需应对动态变化环境的情形。世界模型通常依赖重放缓冲区进行训练,该机制可自然扩展至持续学习。我们系统研究了选择性经验重放方法对性能、遗忘效应及迁移能力的影响,并针对使用世界模型时的各类建模选项提出了建议。经最优方案组合形成的"持续梦境者"(Continual-Dreamer)框架具有任务无关性,其利用世界模型实现持续探索。该框架具有样本效率优势,在Minigrid与Minihack基准测试中超越了现有最优的任务无关型持续强化学习方法。