Reinforcement learning (RL) has become a promising approach to developing controllers for quadrupedal robots. Conventionally, an RL design for locomotion follows a position-based paradigm, wherein an RL policy outputs target joint positions at a low frequency that are then tracked by a high-frequency proportional-derivative (PD) controller to produce joint torques. In contrast, for the model-based control of quadrupedal locomotion, there has been a paradigm shift from position-based control to torque-based control. In light of the recent advances in model-based control, we explore an alternative to the position-based RL paradigm, by introducing a torque-based RL framework, where an RL policy directly predicts joint torques at a high frequency, thus circumventing the use of a PD controller. The proposed learning torque control framework is validated with extensive experiments, in which a quadruped is capable of traversing various terrain and resisting external disturbances while following user-specified commands. Furthermore, compared to learning position control, learning torque control demonstrates the potential to achieve a higher reward and is more robust to significant external disturbances. To our knowledge, this is the first sim-to-real attempt for end-to-end learning torque control of quadrupedal locomotion.
翻译:强化学习(RL)已成为开发四足机器人控制器的一种有前景的方法。传统上,用于运动的RL设计遵循位置范式,即RL策略以低频输出目标关节位置,随后由高频比例-微分(PD)控制器进行跟踪,以产生关节力矩。相比之下,在四足运动基于模型的控制领域,已从基于位置的控制范式转向基于力矩的控制范式。鉴于基于模型控制的最新进展,我们探索了位置RL范式的替代方案,引入了一种基于力矩的RL框架,其中RL策略直接以高频预测关节力矩,从而绕开了PD控制器的使用。所提出的学习式力矩控制框架通过大量实验得到验证,实验表明四足机器人能够在遵循用户指定指令的同时穿越各种地形并抵抗外部扰动。此外,与学习式位置控制相比,学习式力矩控制展现出实现更高奖励的潜力,并且对显著外部扰动具有更强的鲁棒性。据我们所知,这是首次将四足运动端到端学习式力矩控制从仿真迁移到实际应用的尝试。