Deep Reinforcement Learning agents often suffer from catastrophic forgetting, forgetting previously found solutions in parts of the input space when training on new data. Replay Memories are a common solution to the problem, decorrelating and shuffling old and new training samples. They naively store state transitions as they come in, without regard for redundancy. We introduce a novel cognitive-inspired replay memory approach based on the Grow-When-Required (GWR) self-organizing network, which resembles a map-based mental model of the world. Our approach organizes stored transitions into a concise environment-model-like network of state-nodes and transition-edges, merging similar samples to reduce the memory size and increase pair-wise distance among samples, which increases the relevancy of each sample. Overall, our paper shows that map-based experience replay allows for significant memory reduction with only small performance decreases.
翻译:深度强化学习智能体常常遭受灾难性遗忘,即在训练新数据时忘记输入空间中先前找到的解决方案。经验回放是解决该问题的常用方法,它能够对旧训练样本和新训练样本进行去相关和随机化处理。这些方法在接收状态转换时会直接存储,而不考虑冗余性。我们提出了一种新颖的、受认知启发的经验回放方法,该方法基于“按需增长”(GWR)自组织网络,类似于一种基于地图的世界心智模型。我们的方法将存储的转换组织成一个简洁的环境模型式网络,由状态节点和转换边构成,通过合并相似样本来减小内存大小并增加样本间的成对距离,从而提升每个样本的相关性。总体而言,本文表明基于地图的经验回放能够显著减少内存占用,而性能仅有轻微下降。