Resource-constrained robotic platforms are particularly useful for tasks that require low-cost hardware alternatives due to the risk of losing the robot, like in search-and-rescue applications, or the need for a large number of devices, like in swarm robotics. For this reason, it is crucial to find mechanisms for adapting reinforcement learning techniques to the constraints imposed by lower computational power and smaller memory capacities of these ultra low-cost robotic platforms. We try to address this need by proposing a method for making imitation learning deployable onto resource-constrained robotic platforms. Here we cast the imitation learning problem as a conditional sequence modeling task and we train a decision transformer using expert demonstrations augmented with a custom reward. Then, we compress the resulting generative model using software optimization schemes, including quantization and pruning. We test our method in simulation using Isaac Gym, a realistic physics simulation environment designed for reinforcement learning. We empirically demonstrate that our method achieves natural looking gaits for Bittle, a resource-constrained quadruped robot. We also run multiple simulations to show the effects of pruning and quantization on the performance of the model. Our results show that quantization (down to 4 bits) and pruning reduce model size by around 30\% while maintaining a competitive reward, making the model deployable in a resource-constrained system.
翻译:资源受限的机器人平台在需要低成本硬件替代方案的任务中尤为实用,例如在搜救应用中机器人存在丢失风险,或群体机器人需要大量设备的情况下。因此,寻找能将强化学习技术适配至超低成本机器人平台低算力和小内存限制的机制至关重要。我们通过提出一种使模仿学习可部署于资源受限机器人平台的方法来解决这一需求。本文将模仿学习问题转化为条件序列建模任务,并利用专家演示数据结合自定义奖励训练决策Transformer。随后,采用包括量化和剪枝在内的软件优化方案压缩生成的生成模型。我们在专为强化学习设计的真实物理仿真环境Isaac Gym中进行仿真测试。实验证明,我们的方法能为资源受限的四足机器人Bittle生成自然步态。通过多次仿真实验,我们量化分析了剪枝和量化对模型性能的影响。结果表明,4比特量化和剪枝在保持竞争力奖励的同时,可将模型体积压缩约30%,使模型能够部署于资源受限系统。