Spring-based actuators in legged locomotion provide energy-efficiency and improved performance, but increase the difficulty of controller design. While previous work has focused on extensive modeling and simulation to find optimal controllers for such systems, we propose to learn model-free controllers directly on the real robot. In our approach, gaits are first synthesized by central pattern generators (CPGs), whose parameters are optimized to quickly obtain an open-loop controller that achieves efficient locomotion. Then, to make this controller more robust and further improve the performance, we use reinforcement learning to close the loop, to learn corrective actions on top of the CPGs. We evaluate the proposed approach on the DLR elastic quadruped bert. Our results in learning trotting and pronking gaits show that exploitation of the spring actuator dynamics emerges naturally from optimizing for dynamic motions, yielding high-performing locomotion, particularly the fastest walking gait recorded on bert, despite being model-free. The whole process takes no more than 1.5 hours on the real robot and results in natural-looking gaits.
翻译:弹簧式执行器在腿部运动中既能提高能量效率与性能,也增加了控制器设计的难度。以往研究主要依赖大量建模和仿真来为这类系统寻找最优控制器,而本文提出直接在真实机器人上学习无模型控制器。该方法首先通过中枢模式发生器(CPGs)合成步态,并通过参数优化快速获得实现高效运动的开环控制器。随后,为使该控制器更具鲁棒性并进一步提升性能,我们采用强化学习构建闭环控制,在CPGs基础上学习修正动作。我们在DLR弹性四足机器人"bert"上评估了该方法。学习快步和小跑步态的实验结果表明:优化动态运动可自然激发出对弹簧执行器动态特性的利用,从而产生高性能运动——尤其使bert实现了其记录中最快的行走步态,尽管该方法完全无模型。整个过程在真实机器人上耗时不超过1.5小时,且最终步态自然流畅。