Hybrid Internal Model: A Simple and Efficient Learner for Agile Legged Locomotion

from arxiv, Use 1 hour to train a quadruped robot capable of traversing any terrain under any disturbances in the open world, Project Page: https://github.com/OpenRobotLab/HIMLoco

Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introduce Hybrid Internal Model (HIM) to estimate them according to the response of the robot. The response, which we refer to as the hybrid internal embedding, contains the robot's explicit velocity and implicit stability representation, corresponding to two primary goals for locomotion tasks: explicitly tracking velocity and implicitly maintaining stability. We use contrastive learning to optimize the embedding to be close to the robot's successor state, in which the response is naturally embedded. HIM has several appealing benefits: It only needs the robot's proprioceptions, i.e., those from joint encoders and IMU as observations. It innovatively maintains consistent observations between simulation reference and reality that avoids information loss in mimicking learning. It exploits batch-level information that is more robust to noises and keeps better sample efficiency. It only requires 1 hour of training on an RTX 4090 to enable a quadruped robot to traverse any terrain under any disturbances. A wealth of real-world experiments demonstrates its agility, even in high-difficulty tasks and cases never occurred during the training process, revealing remarkable open-world generalizability.

翻译：鲁棒的运动控制依赖于准确的状态估计。然而，大多数足式机器人的传感器只能提供部分且带有噪声的观测，这使得状态估计尤为困难，特别是对于地形摩擦力和高程图等外部状态。受经典内模控制原理的启发，我们将这些外部状态视为扰动，并引入混合内模法（HIM），根据机器人的响应来估计这些扰动。该响应——我们称之为混合内嵌表征——包含了机器人的显式速度和隐式稳定性表示，分别对应运动任务的两个主要目标：显式追踪速度和隐式维持稳定。我们采用对比学习优化该内嵌表征，使其接近机器人的后继状态，而响应自然地嵌入其中。HIM具有多个显著优势：它仅需利用机器人的本体感知信息（即关节编码器和惯性测量单元的观测值）；创新性地在仿真参考与现实环境之间保持一致观测，避免了模仿学习中的信息损失；利用批次级信息，对噪声更具鲁棒性并保持更好的样本效率；仅需在RTX 4090上训练1小时即可使四足机器人穿越任意扰动下的任何地形。大量真实实验验证了其敏捷性，即使在训练过程中从未出现过的高难度任务和场景中亦表现优异，展现出卓越的开放世界泛化能力。