Advancing reinforcement learning (RL) requires tools that are flexible enough to easily prototype new methods while avoiding impractically slow experimental turnaround times. To match the first requirement, the most popular RL libraries advocate for highly modular agent composability, which facilitates experimentation and development. To solve challenging environments within reasonable time frames, scaling RL to large sampling and computing resources has proved a successful strategy. However, this capability has been so far difficult to combine with modularity. In this work, we explore design choices to allow agent composability both at a local and distributed level of execution. We propose a versatile approach that allows the definition of RL agents at different scales through independent reusable components. We demonstrate experimentally that our design choices allow us to reproduce classical benchmarks, explore multiple distributed architectures, and solve novel and complex environments while giving full control to the user in the agent definition and training scheme definition. We believe this work can provide useful insights to the next generation of RL libraries.
翻译:推进强化学习(RL)需要具备灵活性的工具,既能轻松原型化新方法,又能避免实验周转时间过慢。为满足第一个要求,主流强化学习库倡导高度模块化的智能体可组合性,这便于实验与开发。在合理时间内解决具有挑战性的环境时,将强化学习扩展到大规模采样与计算资源已被证明是有效策略。然而,目前这一能力难以与模块化结合。本文探索了设计选择,以实现智能体在本地与分布式执行层面的可组合性。我们提出一种通用方法,允许通过独立可复用组件在不同尺度上定义强化学习智能体。实验证明,我们的设计选择能够复现经典基准测试、探索多种分布式架构,并解决新颖复杂的环境,同时给予用户在智能体定义和训练方案定义上的完全控制权。我们相信这项工作可为下一代强化学习库提供有益启示。