Deep reinforcement learning methods exhibit impressive performance on a range of tasks but still struggle on hard exploration tasks in large environments with sparse rewards. To address this, intrinsic rewards can be generated using forward model prediction errors that decrease as the environment becomes known, and incentivize an agent to explore novel states. While prediction-based intrinsic rewards can help agents solve hard exploration tasks, they can suffer from catastrophic forgetting and actually increase at visited states. We first examine the conditions and causes of catastrophic forgetting in grid world environments. We then propose a new method FARCuriosity, inspired by how humans and animals learn. The method depends on fragmentation and recall: an agent fragments an environment based on surprisal, and uses different local curiosity modules (prediction-based intrinsic reward functions) for each fragment so that modules are not trained on the entire environment. At each fragmentation event, the agent stores the current module in long-term memory (LTM) and either initializes a new module or recalls a previously stored module based on its match with the current state. With fragmentation and recall, FARCuriosity achieves less forgetting and better overall performance in games with varied and heterogeneous environments in the Atari benchmark suite of tasks. Thus, this work highlights the problem of catastrophic forgetting in prediction-based curiosity methods and proposes a solution.
翻译:深度强化学习方法在一系列任务上展现出令人瞩目的性能,但在奖励稀疏的大型环境中,处理硬探索任务时仍面临挑战。为此,可利用前向模型预测误差生成内在奖励,该误差随环境逐渐熟悉而减小,从而激励智能体探索新状态。虽然基于预测的内在奖励能帮助智能体解决硬探索任务,但其可能遭受灾难性遗忘,且在已访问状态处实际增加。我们首先分析了网格世界环境中灾难性遗忘的条件与成因,随后借鉴人类与动物的学习机制提出新方法FARCuriosity。该方法基于碎片化与回忆:智能体根据惊讶度对环境进行碎片化,并为每个碎片分配不同的局部好奇心模块(基于预测的内在奖励函数),使各模块无需在整个环境上训练。在每次碎片化事件中,智能体将当前模块存储至长期记忆(LTM),并根据与当前状态的匹配度初始化新模块或调用先前存储的模块。通过碎片化与回忆,FARCuriosity在Atari基准测试套件中多样化与异质环境下的游戏中实现了更少的遗忘与更优的整体性能。因此,本工作揭示了基于预测的好奇心方法中的灾难性遗忘问题,并提出了解决方案。