This paper presents a control framework that combines model-based optimal control and reinforcement learning (RL) to achieve versatile and robust legged locomotion. Our approach enhances the RL training process by incorporating on-demand reference motions generated through finite-horizon optimal control, covering a broad range of velocities and gaits. These reference motions serve as targets for the RL policy to imitate, leading to the development of robust control policies that can be learned with reliability. Furthermore, by utilizing realistic simulation data that captures whole-body dynamics, RL effectively overcomes the inherent limitations in reference motions imposed by modeling simplifications. We validate the robustness and controllability of the RL training process within our framework through a series of experiments. In these experiments, our method showcases its capability to generalize reference motions and effectively handle more complex locomotion tasks that may pose challenges for the simplified model, thanks to RL's flexibility. Additionally, our framework effortlessly supports the training of control policies for robots with diverse dimensions, eliminating the necessity for robot-specific adjustments in the reward function and hyperparameters.
翻译:本文提出了一种结合模型最优控制与强化学习(RL)的控制框架,以实现多功能且稳健的腿式运动。我们的方法通过在强化学习训练过程中引入通过有限时域最优控制生成的按需参考运动来增强训练效果,这些参考运动覆盖了广泛的速度范围与步态。这些参考运动作为强化学习策略模仿的目标,从而促成了稳健控制策略的可靠学习。此外,通过利用捕捉全身动力学的真实仿真数据,强化学习有效克服了因模型简化而加诸参考运动的固有局限性。我们通过一系列实验验证了框架内强化学习训练过程的稳健性与可控性。在这些实验中,得益于强化学习的灵活性,我们的方法展示了其泛化参考运动并有效处理更复杂运动任务的能力,这些任务可能对简化模型构成挑战。此外,我们的框架轻松支持为不同尺寸的机器人训练控制策略,无需针对特定机器人调整奖励函数与超参数。