Deep Reinforcement Learning (RL) can yield capable agents and control policies in several domains but is commonly plagued by prohibitively long training times. Additionally, in the case of continuous control problems, the applicability of learned policies on real-world embedded devices is limited due to the lack of real-time guarantees and portability of existing libraries. To address these challenges, we present RLtools, a dependency-free, header-only, pure C++ library for deep supervised and reinforcement learning. Its novel architecture allows RLtools to be used on a wide variety of platforms, from HPC clusters over workstations and laptops to smartphones, smartwatches, and microcontrollers. Specifically, due to the tight integration of the RL algorithms with simulation environments, RLtools can solve popular RL problems up to 76 times faster than other popular RL frameworks. We also benchmark the inference on a diverse set of microcontrollers and show that in most cases our optimized implementation is by far the fastest. Finally, RLtools enables the first-ever demonstration of training a deep RL algorithm directly on a microcontroller, giving rise to the field of Tiny Reinforcement Learning (TinyRL). The source code as well as documentation and live demos are available through our project page at https://rl.tools.
翻译:深度强化学习(Deep Reinforcement Learning,RL)能够产生多个领域中的智能体与控制策略,但普遍面临训练时间过长这一阻碍。此外,在连续控制问题中,由于现有库缺乏实时保证和可移植性,学习策略在真实世界嵌入式设备上的适用性受到限制。为解决这些挑战,我们提出RLtools——一个无依赖、纯头文件、纯C++的深度监督学习与强化学习库。其新颖架构使RLtools能够广泛应用于从HPC集群、工作站、笔记本电脑到智能手机、智能手表和微控制器的多种平台。特别是,由于RL算法与仿真环境的紧密集成,RLtools解决常见RL问题的速度比其他流行RL框架快达76倍。我们还对多种微控制器的推理性能进行了基准测试,结果表明在大多数情况下,我们的优化实现是最快的。最终,RLtools实现了首次直接在微控制器上训练深度RL算法的演示,催生了微型强化学习(TinyRL)领域。源代码、文档及在线演示可通过我们的项目页面(https://rl.tools)获取。