The sparsity of extrinsic rewards poses a serious challenge for reinforcement learning (RL). Currently, many efforts have been made on curiosity which can provide a representative intrinsic reward for effective exploration. However, the challenge is still far from being solved. In this paper, we present a novel curiosity for RL, named DyMeCu, which stands for Dynamic Memory-based Curiosity. Inspired by human curiosity and information theory, DyMeCu consists of a dynamic memory and dual online learners. The curiosity arouses if memorized information can not deal with the current state, and the information gap between dual learners can be formulated as the intrinsic reward for agents, and then such state information can be consolidated into the dynamic memory. Compared with previous curiosity methods, DyMeCu can better mimic human curiosity with dynamic memory, and the memory module can be dynamically grown based on a bootstrap paradigm with dual learners. On multiple benchmarks including DeepMind Control Suite and Atari Suite, large-scale empirical experiments are conducted and the results demonstrate that DyMeCu outperforms competitive curiosity-based methods with or without extrinsic rewards. We will release the code to enhance reproducibility.
翻译:外在奖励的稀疏性给强化学习(RL)带来了严峻挑战。目前,许多工作致力于好奇心机制,它能为有效探索提供具有代表性的内在奖励。然而,这一挑战尚未得到根本解决。本文提出一种名为DyMeCu(Dynamic Memory-based Curiosity,动态记忆驱动好奇心)的新型RL好奇心机制。受人类好奇心与信息论启发,DyMeCu由动态记忆与双在线学习器组成:当记忆信息无法应对当前状态时,好奇心被激发;双学习器之间的信息差异可形式化为智能体的内在奖励,随后该状态信息被整合至动态记忆。与以往好奇心方法相比,DyMeCu能通过动态记忆更好模拟人类好奇心,且其记忆模块可基于双学习器的自举范式动态增长。在涵盖DeepMind Control Suite和Atari Suite的多个基准上开展的大规模实证实验表明:无论是否存在外部奖励,DyMeCu均优于现有竞赛级好奇心方法。我们将开源代码以增强可复现性。