Q-learning is one of the most well-known Reinforcement Learning algorithms. There have been tremendous efforts to develop this algorithm using neural networks. Bootstrapped Deep Q-Learning Network is amongst them. It utilizes multiple neural network heads to introduce diversity into Q-learning. Diversity can sometimes be viewed as the amount of reasonable moves an agent can take at a given state, analogous to the definition of the exploration ratio in RL. Thus, the performance of Bootstrapped Deep Q-Learning Network is deeply connected with the level of diversity within the algorithm. In the original research, it was pointed out that a random prior could improve the performance of the model. In this article, we further explore the possibility of replacing priors with noise and sample the noise from a Gaussian distribution to introduce more diversity into this algorithm. We conduct our experiment on the Atari benchmark and compare our algorithm to both the original and other related algorithms. The results show that our modification of the Bootstrapped Deep Q-Learning algorithm achieves significantly higher evaluation scores across different types of Atari games. Thus, we conclude that replacing priors with noise can improve Bootstrapped Deep Q-Learning's performance by ensuring the integrity of diversities.
翻译:Q-learning是最著名的强化学习算法之一。人们已付出巨大努力,利用神经网络来发展该算法。Bootstrapped Deep Q-Learning Network(Bootstrapped DQN)便是其中之一。它利用多个神经网络头为Q-learning引入多样性。多样性有时可被视作智能体在给定状态下可采取的合理行动的数量,类似于强化学习中探索率的定义。因此,Bootstrapped Deep Q-Learning Network的性能与算法内部的多样性水平密切相关。在原始研究中曾指出,随机先验可以提升模型性能。在本文中,我们进一步探索用噪声替代先验的可能性,并从高斯分布中采样噪声,以向该算法引入更多多样性。我们在Atari基准测试上进行实验,并将我们的算法与原始算法及其他相关算法进行比较。结果表明,我们对Bootstrapped Deep Q-Learning算法的改进在不同类型的Atari游戏中均取得了显著更高的评估分数。因此,我们得出结论:通过用噪声替代先验来确保多样性的完整性,可以提升Bootstrapped Deep Q-Learning的性能。