Artificial neural networks are promising for general function approximation but challenging to train on non-independent or non-identically distributed data due to catastrophic forgetting. The experience replay buffer, a standard component in deep reinforcement learning, is often used to reduce forgetting and improve sample efficiency by storing experiences in a large buffer and using them for training later. However, a large replay buffer results in a heavy memory burden, especially for onboard and edge devices with limited memory capacities. We propose memory-efficient reinforcement learning algorithms based on the deep Q-network algorithm to alleviate this problem. Our algorithms reduce forgetting and maintain high sample efficiency by consolidating knowledge from the target Q-network to the current Q-network. Compared to baseline methods, our algorithms achieve comparable or better performance in both feature-based and image-based tasks while easing the burden of large experience replay buffers.
翻译:人工神经网络在通用函数逼近方面具有广阔前景,但由于灾难性遗忘问题,在非独立或非同分布数据上的训练具有挑战性。经验回放缓冲区作为深度强化学习的标准组件,通过将经验存储在大缓冲区中并后续用于训练,常被用来减少遗忘并提高样本效率。然而,大容量回放缓冲区会带来沉重的内存负担,尤其对于内存容量有限的机载设备和边缘设备而言。我们提出基于深度Q网络算法的记忆高效强化学习算法来缓解该问题。我们的算法通过将知识从目标Q网络巩固到当前Q网络,减少了遗忘并维持了高样本效率。与基线方法相比,我们的算法在基于特征和基于图像的任务中均能达到相当或更优的性能,同时减轻了大容量经验回放缓冲区的负担。