Reinforcement Learning has achieved tremendous success in the many Atari games. In this paper we explored with the lunar lander environment and implemented classical methods including Q-Learning, SARSA, MC as well as tiling coding. We also implemented Neural Network based methods including DQN, Double DQN, Clipped DQN. On top of these, we proposed a new algorithm called Heuristic RL which utilizes heuristic to guide the early stage training while alleviating the introduced human bias. Our experiments showed promising results for our proposed methods in the lunar lander environment.
翻译:强化学习已在众多Atari游戏中取得了巨大成功。本文以月球着陆器环境为研究对象,实现了包括Q学习、SARSA、蒙特卡洛方法及分块编码在内的经典算法,并基于神经网络实现了DQN、Double DQN、Clipped DQN等算法。在此基础上,我们提出了一种名为启发式强化学习的新算法,该算法利用启发式方法指导早期训练阶段,同时缓解引入的人为偏差。实验结果表明,该方法在月球着陆器环境中展现出了良好性能。