Deep Reinforcement Learning (RL) has been demonstrated to yield capable agents and control policies in several domains but is commonly plagued by prohibitively long training times. Additionally, in the case of continuous control problems, the applicability of learned policies on real-world embedded devices is limited due to the lack of real-time guarantees and portability of existing deep learning libraries. To address these challenges, we present BackpropTools, a dependency-free, header-only, pure C++ library for deep supervised and reinforcement learning. Leveraging the template meta-programming capabilities of recent C++ standards, we provide composable components that can be tightly integrated by the compiler. Its novel architecture allows BackpropTools to be used seamlessly on a heterogeneous set of platforms, from HPC clusters over workstations and laptops to smartphones, smartwatches, and microcontrollers. Specifically, due to the tight integration of the RL algorithms with simulation environments, BackpropTools can solve popular RL problems like the Pendulum-v1 swing-up about 7 to 15 times faster in terms of wall-clock training time compared to other popular RL frameworks when using TD3. We also provide a low-overhead and parallelized interface to the MuJoCo simulator, showing that our PPO implementation achieves state of the art returns in the Ant-v4 environment while achieving a 25 to 30 percent faster wall-clock training time. Finally, we also benchmark the policy inference on a diverse set of microcontrollers and show that in most cases our optimized inference implementation is much faster than even the manufacturer's DSP libraries. To the best of our knowledge, BackpropTools enables the first-ever demonstration of training a deep RL algorithm directly on a microcontroller, giving rise to the field of Tiny Reinforcement Learning (TinyRL). Project page: https://backprop.tools
翻译:深度强化学习已在多个领域展现出能够生成智能体和控制策略的能力,但通常受限于过长的训练时间。此外,在连续控制问题中,由于缺乏实时性保证以及现有深度学习库的可移植性不足,学习策略在真实世界嵌入式设备上的应用受到限制。为应对这些挑战,我们提出BackpropTools——一个无依赖、仅头文件、纯C++实现的深度监督学习与强化学习库。借助现代C++标准的模板元编程能力,我们提供了可由编译器紧密集成的可组合组件。其新颖架构使得BackpropTools能够无缝应用于异构平台,涵盖从HPC集群、工作站、笔记本电脑到智能手机、智能手表及微控制器。具体而言,通过强化学习算法与仿真环境的紧密集成,BackpropTools在使用TD3算法解决Pendulum-v1摆振问题等典型强化学习任务时,相比其他主流框架,其挂钟训练时间可加速约7至15倍。我们还提供了面向MuJoCo模拟器的低开销并行化接口,实验表明,在Ant-v4环境中,我们的PPO实现达到最优回报的同时,挂钟训练时间缩短了25%至30%。最后,我们在多种微控制器上对策略推理进行了基准测试,结果显示,在大多数情况下,我们优化的推理实现甚至远超制造商提供的数字信号处理库。据我们所知,BackpropTools首次实现了直接在微控制器上训练深度强化学习算法,由此开辟了微小型强化学习这一新兴领域。项目页面:https://backprop.tools