Reinforcement learning (RL) has shown promise in creating robust policies for robotics tasks. However, contemporary RL algorithms are data-hungry, often requiring billions of environment transitions to train successful policies. This necessitates the use of fast and highly-parallelizable simulators. In addition to speed, such simulators need to model the physics of the robots and their interaction with the environment to a level acceptable for transferring policies learned in simulation to reality. We present QuadSwarm, a fast, reliable simulator for research in single and multi-robot RL for quadrotors that addresses both issues. QuadSwarm, with fast forward-dynamics propagation decoupled from rendering, is designed to be highly parallelizable such that throughput scales linearly with additional compute. It provides multiple components tailored toward multi-robot RL, including diverse training scenarios, and provides domain randomization to facilitate the development and sim2real transfer of multi-quadrotor control policies. Initial experiments suggest that QuadSwarm achieves over 48,500 simulation samples per second (SPS) on a single quadrotor and over 62,000 SPS on eight quadrotors on a 16-core CPU. The code can be found in https://github.com/Zhehui-Huang/quad-swarm-rl.
翻译:强化学习(RL)在机器人任务中展现出生成鲁棒策略的潜力,但现有RL算法存在数据饥饿特性,通常需要数十亿次环境交互才能训练出有效策略。这要求使用快速且高度并行化的模拟器。除速度外,此类模拟器还需对机器人物理特性及其与环境的交互进行建模,以达到将仿真策略迁移至现实的可接受水平。我们提出QuadSwarm——一种面向四旋翼单/多机器人RL研究的快速可靠模拟器,旨在解决上述两方面问题。QuadSwarm采用与渲染解耦的快速正向动力学传播,其设计高度可并行化,使得吞吐量随计算资源增加呈线性扩展。该平台提供多种针对多机器人RL定制的组件,包括多样化训练场景,并支持领域随机化以促进多四旋翼控制策略的开发与仿真到现实迁移。初步实验表明,在16核CPU上,QuadSwarm对单四旋翼可实现每秒超过48,500个仿真样本(SPS),对八架四旋翼则超过62,000 SPS。代码详见https://github.com/Zhehui-Huang/quad-swarm-rl。