Sample efficiency remains a crucial challenge in applying Reinforcement Learning (RL) to real-world tasks. While recent algorithms have made significant strides in improving sample efficiency, none have achieved consistently superior performance across diverse domains. In this paper, we introduce EfficientZero V2, a general framework designed for sample-efficient RL algorithms. We have expanded the performance of EfficientZero to multiple domains, encompassing both continuous and discrete actions, as well as visual and low-dimensional inputs. With a series of improvements we propose, EfficientZero V2 outperforms the current state-of-the-art (SOTA) by a significant margin in diverse tasks under the limited data setting. EfficientZero V2 exhibits a notable advancement over the prevailing general algorithm, DreamerV3, achieving superior outcomes in 50 of 66 evaluated tasks across diverse benchmarks, such as Atari 100k, Proprio Control, and Vision Control.
翻译:样本效率仍是强化学习在真实世界任务应用中的关键挑战。尽管近期算法在提升样本效率方面取得了显著进展,但尚无算法能在不同领域均实现持续优越的性能。本文提出EfficientZero V2——一个面向样本高效强化学习算法的通用框架。我们将EfficientZero的性能扩展至多个领域,涵盖连续与离散动作,以及视觉与低维输入。通过提出的一系列改进措施,EfficientZero V2在有限数据设定下,于多样化任务中大幅超越当前最优方法。相较于主流通用算法DreamerV3,EfficientZero V2展现出显著进步:在Atari 100k、关节控制与视觉控制等多个基准测试中,于66项评估任务中的50项取得了更优结果。