Agents must be able to adapt quickly as an environment changes. We find that existing model-based reinforcement learning agents are unable to do this well, in part because of how they use past experiences to train their world model. Here, we present Curious Replay -- a form of prioritized experience replay tailored to model-based agents through use of a curiosity-based priority signal. Agents using Curious Replay exhibit improved performance in an exploration paradigm inspired by animal behavior and on the Crafter benchmark. DreamerV3 with Curious Replay surpasses state-of-the-art performance on Crafter, achieving a mean score of 19.4 that substantially improves on the previous high score of 14.5 by DreamerV3 with uniform replay, while also maintaining similar performance on the Deepmind Control Suite. Code for Curious Replay is available at https://github.com/AutonomousAgentsLab/curiousreplay
翻译:智能体必须能够在环境变化时快速适应。我们发现,现有的基于模型的强化学习智能体无法很好地做到这一点,部分原因在于它们利用过往经验训练世界模型的方式。在此,我们提出好奇回放(Curious Replay)——一种通过利用基于好奇心的优先级信号,专为基于模型智能体量身定制的优先经验回放形式。使用好奇回放的智能体在受动物行为启发的探索范式和Crafter基准测试中展现出更优的性能。采用好奇回放的DreamerV3在Crafter上超越了现有最先进性能,实现了19.4的平均得分,较此前采用均匀回放的DreamerV3所取得14.5的最高分有显著提升,同时在Deepmind控制套件上保持了相近的性能。好奇回放的代码详见 https://github.com/AutonomousAgentsLab/curiousreplay