Control of legged robots is a challenging problem that has been investigated by different approaches, such as model-based control and learning algorithms. This work proposes a novel Imitating and Finetuning Model Predictive Control (IFM) framework to take the strengths of both approaches. Our framework first develops a conventional model predictive controller (MPC) using Differential Dynamic Programming and Raibert heuristic, which serves as an expert policy. Then we train a clone of the MPC using imitation learning to make the controller learnable. Finally, we leverage deep reinforcement learning with limited exploration for further finetuning the policy on more challenging terrains. By conducting comprehensive simulation and hardware experiments, we demonstrate that the proposed IFM framework can significantly improve the performance of the given MPC controller on rough, slippery, and conveyor terrains that require careful coordination of footsteps. We also showcase that IFM can efficiently produce more symmetric, periodic, and energy-efficient gaits compared to Vanilla RL with a minimal burden of reward shaping.
翻译:腿式机器人控制是一个具有挑战性的问题,已通过模型驱动控制和学习算法等多种方法进行研究。本文提出一种新颖的模仿与微调模型预测控制(IFM)框架,融合两类方法的优势。该框架首先利用微分动态规划与Raibert启发式方法构建传统模型预测控制器(MPC),将其作为专家策略;随后通过模仿学习训练MPC的克隆版本,使控制器具备可学习性;最后借助有限探索的深度强化学习,在更具挑战性的地形上进一步微调策略。通过全面的仿真与硬件实验,我们证明所提出的IFM框架能显著提升原始MPC控制器在粗糙、湿滑及传送带等需精细协调步态的地形上的性能。我们还展示,相较于普通强化学习,IFM能以极低的奖励塑造成本更高效地生成更对称、更周期且更节能的步态。