Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. Before overcoming interference we must understand it better. In this work, we provide a definition and novel measure of interference for value-based reinforcement learning methods such as Fitted Q-Iteration and DQN. We systematically evaluate our measure of interference, showing that it correlates with instability in control performance, across a variety of network architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures and study learning algorithms which mitigate interference. Lastly, we outline a class of algorithms which we call online-aware that are designed to mitigate interference, and show they do reduce interference according to our measure and that they improve stability and performance in several classic control environments.
翻译:灾难性干扰在许多基于网络的学习系统中普遍存在,目前已有多种缓解方案被提出。在克服干扰之前,我们必须更深入地理解它。本研究针对基于值的强化学习方法(如拟合Q迭代和DQN),提出了干扰的定义与新型度量指标。我们系统评估了该干扰度量,表明其与多种网络架构下控制性能的不稳定性存在相关性。这一新型干扰度量使我们能够探索关于常用深度学习架构的新科学问题,并研究可缓解干扰的学习算法。最后,我们概述了一类称为"在线感知"的算法家族,该算法旨在缓解干扰。实验表明,根据我们的度量标准,这些算法确实能够减少干扰,并在多个经典控制环境中提升稳定性与性能表现。