The ever-growing complexity of reinforcement learning (RL) tasks demands a distributed RL system to efficiently generate and process a massive amount of data to train intelligent agents. However, existing open-source libraries suffer from various limitations, which impede their practical use in challenging scenarios where large-scale training is necessary. While industrial systems from OpenAI and DeepMind have achieved successful large-scale RL training, their system architecture and implementation details remain undisclosed to the community. In this paper, we present a novel abstraction on the dataflows of RL training, which unifies practical RL training across diverse applications into a general framework and enables fine-grained optimizations. Following this abstraction, we develop a scalable, efficient, and extensible distributed RL system called ReaLly Scalable RL (SRL). The system architecture of SRL separates major RL computation components and allows massively parallelized training. Moreover, SRL offers user-friendly and extensible interfaces for customized algorithms. Our evaluation shows that SRL outperforms existing academic libraries in both a single machine and a medium-sized cluster. In a large-scale cluster, the novel architecture of SRL leads to up to 3.7x speedup compared to the design choices adopted by the existing libraries. We also conduct a direct benchmark comparison to OpenAI's industrial system, Rapid, in the challenging hide-and-seek environment. SRL reproduces the same solution as reported by OpenAI with up to 5x speedup in wall-clock time. Furthermore, we also examine the performance of SRL in a much harder variant of the hide-and-seek environment and achieve substantial learning speedup by scaling SRL to over 15k CPU cores and 32 A100 GPUs. Notably, SRL is the first in the academic community to perform RL experiments at such a large scale.
翻译:强化学习任务日益增长的复杂性,要求分布式强化学习系统能够高效生成并处理海量数据以训练智能体。然而,现有开源库存在诸多局限性,阻碍了其在需要大规模训练的高挑战场景中的实际应用。尽管OpenAI和DeepMind的工业系统已成功实现大规模强化学习训练,但其系统架构与实现细节仍未向社区公开。本文提出一种关于强化学习训练数据流的新型抽象方法,该方法将跨多种应用的实际强化学习训练统一为通用框架,并支持细粒度优化。基于该抽象,我们开发了名为ReaLly Scalable RL(SRL)的可扩展、高效且可扩展的分布式强化学习系统。SRL的系统架构将主要强化学习计算组件分离,支持大规模并行训练。此外,SRL为定制化算法提供用户友好且可扩展的接口。实验表明,SRL在单机和中等规模集群上均优于现有学术库。在大型集群中,SRL的新型架构相比现有库的设计选择实现了最高3.7倍的加速。我们还在极具挑战性的捉迷藏环境中与OpenAI的工业系统Rapid进行直接基准对比。SRL复现了OpenAI报告的相同解决方案,且实际运行时间加速最高达5倍。此外,我们测试了SRL在难度更高的捉迷藏变体环境中的性能,通过将SRL扩展至超过1.5万CPU核心和32块A100 GPU,实现了显著的学习加速。值得注意的是,SRL是学术界首个在此大规模下进行强化学习实验的系统。