This paper presents a control framework that combines model-based optimal control and reinforcement learning (RL) to achieve versatile and robust legged locomotion. Our approach enhances the RL training process by incorporating on-demand reference motions generated through finite-horizon optimal control, covering a broad range of velocities and gaits. These reference motions serve as targets for the RL policy to imitate, leading to the development of robust control policies that can be learned with reliability. Furthermore, by utilizing realistic simulation data that captures whole-body dynamics, RL effectively overcomes the inherent limitations in reference motions imposed by modeling simplifications. We validate the robustness and controllability of the RL training process within our framework through a series of experiments. In these experiments, our method showcases its capability to generalize reference motions and effectively handle more complex locomotion tasks that may pose challenges for the simplified model, thanks to RL's flexibility. Additionally, our framework effortlessly supports the training of control policies for robots with diverse dimensions, eliminating the necessity for robot-specific adjustments in the reward function and hyperparameters.
翻译:本文提出一种融合模型预测控制与强化学习的控制框架,旨在实现足式机器人的多功能稳健运动。该方法通过引入有限时域最优控制生成的按需参考运动序列(涵盖多种速度与步态模式),有效强化了强化学习训练过程。这些参考轨迹作为强化学习策略的模仿目标,可可靠地训练出鲁棒的控制策略。此外,利用捕捉全身动力学的真实仿真数据,强化学习成功克服了参考运动因模型简化而产生的固有局限。通过系列实验验证该框架下强化学习训练过程的鲁棒性与可控性,实验表明:得益于强化学习的灵活性,本方法不仅能泛化参考运动模式,更能有效处理简化模型难以应对的复杂运动任务。同时,该框架无需针对不同尺寸机器人调整奖励函数与超参数,即可便捷支持多种构型机器人的控制策略训练。