The primacy bias in deep reinforcement learning (DRL), which refers to the agent's tendency to overfit early data and lose the ability to learn from new data, can significantly decrease the performance of DRL algorithms. Previous studies have shown that employing simple techniques, such as resetting the agent's parameters, can substantially alleviate the primacy bias. However, we observe that resetting the agent's parameters harms its performance in the context of model-based reinforcement learning (MBRL). In fact, on further investigation, we find that the primacy bias in MBRL differs from that in model-free RL. In this work, we focus on investigating the primacy bias in MBRL and propose world model resetting, which works in MBRL. We apply our method to two different MBRL algorithms, MBPO and DreamerV2. We validate the effectiveness of our method on multiple continuous control tasks on MuJoCo and DeepMind Control Suite, as well as discrete control tasks on Atari 100k benchmark. The results show that world model resetting can significantly alleviate the primacy bias in model-based setting and improve algorithm's performance. We also give a guide on how to perform world model resetting effectively.
翻译:深度强化学习中的首因偏差(即智能体倾向于过度拟合早期数据并丧失从新数据中学习能力)会显著降低深度强化学习算法的性能。已有研究表明,采用重置智能体参数等简单技术可有效缓解首因偏差。然而我们观察到,在基于模型的强化学习背景下,重置智能体参数反而会损害算法性能。进一步研究发现,基于模型的强化学习中的首因偏差与无模型强化学习存在本质差异。本文重点研究基于模型的强化学习中的首因偏差现象,并提出适用于该范式下的世界模型重置方法。我们将该方法应用于MBPO和DreamerV2两种基于模型的强化学习算法,在MuJoCo和DeepMind Control Suite的多项连续控制任务以及Atari 100k基准的离散控制任务中验证了其有效性。实验结果表明,世界模型重置能够显著缓解基于模型的强化学习中的首因偏差,并提升算法性能。最后,我们给出了有效实施世界模型重置的指导原则。