Stable locomotion in precipitous environments is an essential task for quadruped robots, requiring the ability to resist various external disturbances. Recent neural policies enhance robustness against disturbances by learning to resist external forces sampled from a fixed distribution in the simulated environment. However, the force generation process doesn't consider the robot's current state, making it difficult to identify the most effective direction and magnitude that can push the robot to the most unstable but recoverable state. Thus, challenging cases in the buffer are insufficient to optimize robustness. In this paper, we propose to model the robust locomotion learning process as an adversarial interaction between the locomotion policy and a learnable disturbance that is conditioned on the robot state to generate appropriate external forces. To make the joint optimization stable, our novel $H_{\infty}$ constraint mandates the bound of the ratio between the cost and the intensity of the external forces. We verify the robustness of our approach in both simulated environments and real-world deployment, on quadrupedal locomotion tasks and a more challenging task where the quadruped performs locomotion merely on hind legs. Training and deployment code will be made public.
翻译:在险峻环境中实现稳定运动是四足机器人的一项基本任务,这要求其具备抵抗各种外部干扰的能力。最近的神经策略通过在模拟环境中学习抵抗从固定分布采样的外力来增强抗干扰鲁棒性。然而,该外力生成过程未考虑机器人的当前状态,难以确定最能将机器人推向最不稳定但可恢复状态的有效方向与大小。因此,缓冲区中的挑战性案例不足以充分优化鲁棒性。本文提出将鲁棒运动学习过程建模为运动策略与一种可学习干扰之间的对抗交互,该干扰以机器人状态为条件来生成恰当的外力。为使联合优化过程稳定,我们新颖的$H_{\infty}$约束规定了成本与外力强度之比的界限。我们在模拟环境和实际部署中验证了该方法在四足运动任务以及更具挑战性的四足仅用后腿运动任务上的鲁棒性。训练与部署代码将公开提供。