We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.
翻译:我们提出一种基于价值的强化学习智能体,命名为BBF,该智能体在Atari 100K基准测试中实现了超越人类的性能。BBF依赖于扩展用于价值估计的神经网络规模,以及一系列支持以样本高效方式实现这一扩展的其他设计选择。我们对这些设计选择进行了广泛分析,并为未来研究提供了见解。最后,我们讨论了在ALE上更新样本高效强化学习研究目标基准的问题。我们在https://github.com/google-research/google-research/tree/master/bigger_better_faster 公开发布了代码与数据。