Loss of Plasticity in Continual Deep Reinforcement Learning

The ability to learn continually is essential in a complex and changing world. In this paper, we characterize the behavior of canonical value-based deep reinforcement learning (RL) approaches under varying degrees of non-stationarity. In particular, we demonstrate that deep RL agents lose their ability to learn good policies when they cycle through a sequence of Atari 2600 games. This phenomenon is alluded to in prior work under various guises -- e.g., loss of plasticity, implicit under-parameterization, primacy bias, and capacity loss. We investigate this phenomenon closely at scale and analyze how the weights, gradients, and activations change over time in several experiments with varying dimensions (e.g., similarity between games, number of games, number of frames per game), with some experiments spanning 50 days and 2 billion environment interactions. Our analysis shows that the activation footprint of the network becomes sparser, contributing to the diminishing gradients. We investigate a remarkably simple mitigation strategy -- Concatenated ReLUs (CReLUs) activation function -- and demonstrate its effectiveness in facilitating continual learning in a changing environment.

翻译：持续学习能力在复杂多变的环境中至关重要。本文刻画了典型的基于价值的深度强化学习方法在不同非平稳程度下的行为特征。具体而言，我们证明当深度强化学习代理循环遍历一系列Atari 2600游戏时，其习得优质策略的能力会丧失。先前研究以多种形式提及该现象——例如可塑性丧失、隐式欠参数化、首因偏差与容量衰减。我们在大规模条件下深入研究该现象，通过多组不同维度实验（如游戏间相似度、游戏数量、每局游戏帧数）分析权重、梯度与激活值随时间的变化规律，部分实验持续50天并经历20亿次环境交互。分析表明，网络激活足迹逐渐稀疏化，导致梯度衰减。我们研究了一种极为简单的缓解策略——拼接ReLU激活函数——并证明其在动态环境中促进持续学习的有效性。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日