Model-based approaches for planning and control for bipedal locomotion have a long history of success. It can provide stability and safety guarantees while being effective in accomplishing many locomotion tasks. Model-free reinforcement learning, on the other hand, has gained much popularity in recent years due to computational advancements. It can achieve high performance in specific tasks, but it lacks physical interpretability and flexibility in re-purposing the policy for a different set of tasks. For instance, we can initially train a neural network (NN) policy using velocity commands as inputs. However, to handle new task commands like desired hand or footstep locations at a desired walking velocity, we must retrain a new NN policy. In this work, we attempt to bridge the gap between these two bodies of work on a bipedal platform. We formulate a model-based reinforcement learning problem to learn a reduced-order model (ROM) within a model predictive control (MPC). Results show a 49% improvement in viable task region size and a 21% reduction in motor torque cost. All videos and code are available at https://sites.google.com/view/ymchen/research/rl-for-roms.
翻译:基于模型的双足运动规划与控制方法已有悠久的成功历史,可在保证稳定性与安全性的同时高效完成多种运动任务。相比之下,无模型强化学习近年来因计算技术的进步而广受关注,虽能在特定任务中实现高性能,但缺乏物理可解释性,且难以将策略灵活迁移至不同任务场景。例如,我们可先用速度指令作为输入训练神经网络策略,但若要处理诸如期望手部/足部落点位置伴随目标步行速度的新任务指令,就必须重新训练网络策略。本研究尝试在双足平台上弥合这两类方法的鸿沟,构建基于模型的强化学习框架,在模型预测控制中学习降阶模型。结果表明,可行任务区域面积提升49%,电机扭矩代价降低21%。所有视频与代码已开源至 https://sites.google.com/view/ymchen/research/rl-for-roms。