Existing Deep Reinforcement Learning (DRL) algorithms suffer from sample inefficiency. Generally, episodic control-based approaches are solutions that leverage highly-rewarded past experiences to improve sample efficiency of DRL algorithms. However, previous episodic control-based approaches fail to utilize the latent information from the historical behaviors (e.g., state transitions, topological similarities, etc.) and lack scalability during DRL training. This work introduces Neural Episodic Control with State Abstraction (NECSA), a simple but effective state abstraction-based episodic control containing a more comprehensive episodic memory, a novel state evaluation, and a multi-step state analysis. We evaluate our approach to the MuJoCo and Atari tasks in OpenAI gym domains. The experimental results indicate that NECSA achieves higher sample efficiency than the state-of-the-art episodic control-based approaches. Our data and code are available at the project website\footnote{\url{https://sites.google.com/view/drl-necsa}}.
翻译:现有深度强化学习算法存在样本效率低下的问题。通常,基于情节控制的方法通过利用高奖励的过往经验来改善深度强化学习算法的样本效率。然而,以往的基于情节控制的方法未能充分利用历史行为中的潜在信息(例如状态转移、拓扑相似性等),并且在深度强化学习训练过程中缺乏可扩展性。本文提出了一种具有状态抽象的神经情节控制方法——一种简单但有效的基于状态抽象的情节控制方法,它包含更全面的情节记忆、新颖的状态评估以及多步状态分析。我们在OpenAI gym域中的MuJoCo和Atari任务上评估了该方法。实验结果表明,与最先进的基于情节控制的方法相比,NECSA实现了更高的样本效率。我们的数据和代码可在项目网站上获取。