Stable gait generation is a crucial problem for legged robot locomotion as this impacts other critical performance factors such as, e.g. mobility over an uneven terrain and power consumption. Gait generation stability results from the efficient control of the interaction between the legged robot's body and the environment where it moves. Here, we study how this can be achieved by a combination of model-predictive and predictive reinforcement learning controllers. Model-predictive control (MPC) is a well-established method that does not utilize any online learning (except for some adaptive variations) as it provides a convenient interface for state constraints management. Reinforcement learning (RL), in contrast, relies on adaptation based on pure experience. In its bare-bone variants, RL is not always suitable for robots due to their high complexity and expensive simulation/experimentation. In this work, we combine both control methods to address the quadrupedal robot stable gate generation problem. The hybrid approach that we develop and apply uses a cost roll-out algorithm with a tail cost in the form of a Q-function modeled by a neural network; this allows to alleviate the computational complexity, which grows exponentially with the prediction horizon in a purely MPC approach. We demonstrate that our RL gait controller achieves stable locomotion at short horizons, where a nominal MP controller fails. Further, our controller is capable of live operation, meaning that it does not require previous training. Our results suggest that the hybridization of MPC with RL, as presented here, is beneficial to achieve a good balance between online control capabilities and computational complexity.
翻译:稳定步态生成是腿式机器人运动的关键问题,因为它会影响其他重要性能指标,例如在非平坦地形上的移动能力和功耗。步态生成的稳定性源于对腿式机器人身体与其运动环境之间交互的有效控制。本文研究如何通过结合模型预测控制与预测强化学习控制器来实现这一目标。模型预测控制(MPC)是一种成熟的方法,它不依赖在线学习(除某些自适应变体外),通过提供便捷的状态约束管理接口来实现控制。而强化学习(RL)则依赖于基于纯粹经验的适应。在基础变体中,RL因腿式机器人的高复杂性和昂贵的仿真/实验成本,并不总是适用。本文结合这两种控制方法来解决四足机器人稳定步态生成问题。我们开发并应用的混合方法采用成本展开算法,并以神经网络建模的Q函数作为尾端成本,这有助于缓解纯MPC方法中因预测时域增长而呈指数级增加的计算复杂度。我们证明,在名义MP控制器失败的短预测时域内,我们的RL步态控制器能实现稳定运动。此外,该控制器支持实时运行,无需预先训练。结果表明,本文提出的MPC与RL混合方法能在在线控制能力与计算复杂度之间实现良好平衡。