We present a new computing model for intrinsic rewards in reinforcement learning that addresses the limitations of existing surprise-driven explorations. The reward is the novelty of the surprise rather than the surprise norm. We estimate the surprise novelty as retrieval errors of a memory network wherein the memory stores and reconstructs surprises. Our surprise memory (SM) augments the capability of surprise-based intrinsic motivators, maintaining the agent's interest in exciting exploration while reducing unwanted attraction to unpredictable or noisy observations. Our experiments demonstrate that the SM combined with various surprise predictors exhibits efficient exploring behaviors and significantly boosts the final performance in sparse reward environments, including Noisy-TV, navigation and challenging Atari games.
翻译:我们提出了一种新的计算模型,用于强化学习中的内在奖励,该模型解决了现有惊喜驱动探索的局限性。奖励是惊喜的新颖性,而非惊喜的范数。我们通过记忆网络中的检索误差来估计惊喜新颖性,其中记忆存储并重建惊喜。我们的惊喜记忆(SM)增强了基于惊喜的内在激励器的能力,在维持智能体对激动人心的探索兴趣的同时,减少了对不可预测或噪声观测的不必要吸引。实验表明,将SM与各种惊喜预测器相结合,能够在稀疏奖励环境(包括Noisy-TV、导航和具有挑战性的Atari游戏)中展现出高效的探索行为,并显著提升最终性能。