This work presents the application of reinforcement learning to improve the performance of a highly dynamic hopping system with a parallel mechanism. Unlike serial mechanisms, parallel mechanisms can not be accurately simulated due to the complexity of their kinematic constraints and closed-loop structures. Besides, learning to hop suffers from prolonged aerial phase and the sparse nature of the rewards. To address them, we propose a learning framework to encode long-history feedback to account for the under-actuation brought by the prolonged aerial phase. In the proposed framework, we also introduce a simplified serial configuration for the parallel design to avoid directly simulating parallel structure during the training. A torque-level conversion is designed to deal with the parallel-serial conversion to handle the sim-to-real issue. Simulation and hardware experiments have been conducted to validate this framework.
翻译:本研究提出应用强化学习提升具有并联机构的高度动态跳跃系统的性能。与串联机构不同,并联机构因其运动学约束的复杂性和闭环结构特性而难以进行精确仿真。此外,跳跃学习过程面临长空相周期和奖励稀疏性的挑战。针对这些问题,我们提出一种学习框架,通过编码长历史反馈信息来应对长空相周期引起的欠驱动特性。在该框架中,我们为并联设计引入了一种简化的串联构型,以避免训练期间直接仿真并联结构。同时设计了扭矩级转换机制来处理并联-串联构型转换,以应对仿真到实物的迁移问题。通过仿真与硬件实验验证了该框架的有效性。