Sample efficiency remains a crucial challenge in applying Reinforcement Learning (RL) to real-world tasks. While recent algorithms have made significant strides in improving sample efficiency, none have achieved consistently superior performance across diverse domains. In this paper, we introduce EfficientZero V2, a general framework designed for sample-efficient RL algorithms. We have expanded the performance of EfficientZero to multiple domains, encompassing both continuous and discrete actions, as well as visual and low-dimensional inputs. With a series of improvements we propose, EfficientZero V2 outperforms the current state-of-the-art (SOTA) by a significant margin in diverse tasks under the limited data setting. EfficientZero V2 exhibits a notable advancement over the prevailing general algorithm, DreamerV3, achieving superior outcomes in 50 of 66 evaluated tasks across diverse benchmarks, such as Atari 100k, Proprio Control, and Vision Control.
翻译:样本效率仍然是强化学习应用于现实任务中的关键挑战。尽管近期算法在提升样本效率方面取得了显著进展,但尚无任何方法能在多样领域内持续实现卓越性能。本文提出高效零版本二,这是一个专为样本高效强化学习算法设计的通用框架。我们将高效零的性能扩展至多个领域,涵盖连续与离散动作,以及视觉与低维输入。通过我们提出的一系列改进,高效零版本二在有限数据设置下的多样任务中,以显著优势超越了当前最先进技术。相较于主流通用算法DreamerV3,高效零版本二展现出显著进步,在Atari 100k、本体感觉控制与视觉控制等多样基准测试的66项评估任务中,有50项取得了更优结果。