Augmenting Replay in World Models for Continual Reinforcement Learning

In continual RL, the environment of a reinforcement learning (RL) agent undergoes change. A successful system should appropriately balance the conflicting requirements of retaining agent performance on already learned tasks, stability, whilst learning new tasks, plasticity. The first-in-first-out buffer is commonly used to enhance learning in such settings but requires significant memory. We explore the application of an augmentation to this buffer which alleviates the memory constraints, and use it with a world model model-based reinforcement learning algorithm, to evaluate its effectiveness in facilitating continual learning. We evaluate the effectiveness of our method in Procgen and Atari RL benchmarks and show that the distribution matching augmentation to the replay-buffer used in the context of latent world models can successfully prevent catastrophic forgetting with significantly reduced computational overhead. Yet, we also find such a solution to not be entirely infallible, and other failure modes such as the opposite -- lacking plasticity and being unable to learn a new task -- to be a potential limitation in continual learning systems.

翻译：在持续强化学习中，强化学习智能体所处的环境会发生变化。一个成功的系统需要适当平衡两个相互冲突的要求：在保留已学习任务性能（稳定性）的同时学习新任务（可塑性）。先进先出缓冲区常用于增强此类场景下的学习，但需要大量内存。我们探索了对该缓冲区的增强方法以缓解内存限制，并将其与基于世界模型的模型强化学习算法结合，评估其在促进持续学习方面的有效性。我们在Procgen和Atari强化学习基准上评估了该方法的效果，结果表明，在隐式世界模型背景下对经验回放进行分布匹配增强，能够以显著降低的计算开销成功防止灾难性遗忘。然而，我们也发现此类解决方案并非完全可靠，其他故障模式——例如相反情况：缺乏可塑性且无法学习新任务——可能成为持续学习系统的潜在局限。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日