Data efficiency of learning, which plays a key role in the Reinforcement Learning (RL) training process, becomes even more important in continual RL with sequential environments. In continual RL, the learner interacts with non-stationary, sequential tasks and is required to learn new tasks without forgetting previous knowledge. However, there is little work on implementing data augmentation for continual RL. In this paper, we investigate the efficacy of data augmentation for continual RL. Specifically, we provide benchmarking data augmentations for continual RL, by (1) summarising existing data augmentation methods and (2) including a new augmentation method for continual RL: Adversarial Augmentation with Gradient Episodic Memory (Adv-GEM). Extensive experiments show that data augmentations, such as random amplitude scaling, state-switch, mixup, adversarial augmentation, and Adv-GEM, can improve existing continual RL algorithms in terms of their average performance, catastrophic forgetting, and forward transfer, on robot control tasks. All data augmentation methods are implemented as plug-in modules for trivial integration into continual RL methods.
翻译:学习的数据效率在强化学习训练过程中起着关键作用,在具有序列环境的持续强化学习中则更为重要。在持续强化学习中,学习器与非平稳的序列任务交互,需要在学习新任务的同时不遗忘先前知识。然而,目前鲜有研究在持续强化学习中实施数据增强。本文探究了数据增强在持续强化学习中的有效性。具体而言,我们通过以下方式为持续强化学习提供了基准数据增强方法:(1)总结现有数据增强方法;(2)提出一种新的持续强化学习增强方法:基于梯度情景记忆的对抗增强。大量实验表明,随机幅度缩放、状态切换、混合增强、对抗增强以及Adv-GEM等数据增强方法,能够在机器人控制任务中提升现有持续强化学习算法的平均性能、缓解灾难性遗忘并促进正向迁移。所有数据增强方法均以即插即用模块形式实现,可轻松集成至各类持续强化学习方法中。