We propose MIMOC: Motion Imitation from Model-Based Optimal Control. MIMOC is a Reinforcement Learning (RL) controller that learns agile locomotion by imitating reference trajectories from model-based optimal control. MIMOC mitigates challenges faced by other motion imitation RL approaches because the references are dynamically consistent, require no motion retargeting, and include torque references. Hence, MIMOC does not require fine-tuning. MIMOC is also less sensitive to modeling and state estimation inaccuracies than model-based controllers. We validate MIMOC on the Mini-Cheetah in outdoor environments over a wide variety of challenging terrain, and on the MIT Humanoid in simulation. We show cases where MIMOC outperforms model-based optimal controllers, and show that imitating torque references improves the policy's performance.
翻译:我们提出MIMOC:基于模型最优控制的运动模仿方法。MIMOC是一种强化学习(RL)控制器,通过模仿源自模型最优控制的参考轨迹,学习敏捷运动技能。MIMOC缓解了其他运动模仿强化学习方法面临的挑战,因为其参考轨迹具有动态一致性,无需运动重定向,且包含力矩参考。因此,MIMOC无需精细调参。相较于基于模型的控制器,MIMOC对建模和状态估计误差的敏感度更低。我们在户外多种复杂地形上的Mini-Cheetah机器人,以及仿真环境中的MIT人形机器人上验证了MIMOC的性能。实验表明,MIMOC在部分场景中优于基于模型的最优控制器,且模仿力矩参考能有效提升策略性能。