Rainbow Deep Q-Network (DQN) demonstrated combining multiple independent enhancements could significantly boost a reinforcement learning (RL) agent's performance. In this paper, we present "Beyond The Rainbow" (BTR), a novel algorithm that integrates six improvements from across the RL literature to Rainbow DQN, establishing a new state-of-the-art for RL using a desktop PC, with a human-normalized interquartile mean (IQM) of 7.4 on atari-60. Beyond Atari, we demonstrate BTR's capability to handle complex 3D games, successfully training agents to play Super Mario Galaxy, Mario Kart, and Mortal Kombat with minimal algorithmic changes. Designing BTR with computational efficiency in mind, agents can be trained using a desktop PC on 200 million Atari frames within 12 hours. Additionally, we conduct detailed ablation studies of each component, analzying the performance and impact using numerous measures.
翻译:彩虹深度Q网络(DQN)证明了结合多种独立的增强技术可以显著提升强化学习(RL)智能体的性能。本文提出"超越彩虹"(BTR),这是一种新颖的算法,它将来自强化学习文献的六项改进整合到彩虹DQN中,从而在桌面电脑上为强化学习建立了新的最先进水平,在atari-60上取得了7.4的人类标准化四分位均值(IQM)。除了Atari,我们还展示了BTR处理复杂3D游戏的能力,仅需最少的算法改动即可成功训练智能体游玩《超级马里奥银河》、《马里奥赛车》和《真人快打》。BTR在设计时考虑了计算效率,智能体可以在桌面电脑上于12小时内完成2亿帧Atari游戏的训练。此外,我们对每个组件进行了详细的消融研究,使用多种度量指标分析了其性能与影响。