We present a new computing model for intrinsic rewards in reinforcement learning that addresses the limitations of existing surprise-driven explorations. The reward is the novelty of the surprise rather than the surprise norm. We estimate the surprise novelty as retrieval errors of a memory network wherein the memory stores and reconstructs surprises. Our surprise memory (SM) augments the capability of surprise-based intrinsic motivators, maintaining the agent's interest in exciting exploration while reducing unwanted attraction to unpredictable or noisy observations. Our experiments demonstrate that the SM combined with various surprise predictors exhibits efficient exploring behaviors and significantly boosts the final performance in sparse reward environments, including Noisy-TV, navigation and challenging Atari games.
翻译:我们提出了一种用于强化学习内在奖励的新计算模型,旨在解决现有惊喜驱动探索的局限性。该奖励是惊喜的新颖性,而非惊喜的范数。我们通过记忆网络中的检索误差来估计惊喜新颖性,其中记忆存储并重构惊喜。我们的惊喜记忆增强了基于惊喜的内在动机能力,使智能体在保持对激动人心探索的兴趣的同时,减少了对不可预测或嘈杂观测的不必要吸引。实验表明,将惊喜记忆与各种惊喜预测器结合,能够在稀疏奖励环境中(包括Noisy-TV、导航和具有挑战性的Atari游戏)展现出高效的探索行为,并显著提升最终性能。