Hybrid Internal Model: A Simple and Efficient Learner for Agile Legged Locomotion

from arxiv, Use 1 hour to train a quadruped robot capable of traversing any terrain under any disturbances in the open world, Project Page: https://github.com/OpenRobotLab/HIMLoco

Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introduce Hybrid Internal Model (HIM) to estimate them according to the response of the robot. The response, which we refer to as the hybrid internal embedding, contains the robot's explicit velocity and implicit stability representation, corresponding to two primary goals for locomotion tasks: explicitly tracking velocity and implicitly maintaining stability. We use contrastive learning to optimize the embedding to be close to the robot's successor state, in which the response is naturally embedded. HIM has several appealing benefits: It only needs the robot's proprioceptions, i.e., those from joint encoders and IMU as observations. It innovatively maintains consistent observations between simulation reference and reality that avoids information loss in mimicking learning. It exploits batch-level information that is more robust to noises and keeps better sample efficiency. It only requires 1 hour of training on an RTX 4090 to enable a quadruped robot to traverse any terrain under any disturbances. A wealth of real-world experiments demonstrates its agility, even in high-difficulty tasks and cases never occurred during the training process, revealing remarkable open-world generalizability.

翻译：鲁棒的 locomotion 控制依赖于精确的状态估计。然而，大多数四足机器人的传感器仅能提供部分且含有噪声的观测，这使得估计，尤其是对外部状态如地形摩擦力和高程图的估计极具挑战性。受经典内模控制原理的启发，我们将这些外部状态视为扰动，并引入混合内模（HIM）来根据机器人的响应对其进行估计。我们将该响应称为混合内嵌，它包含机器人明确的运动速度和隐式的稳定性表征，对应于 locomotion 任务的两个主要目标：明确的速度跟踪和隐式的稳定性维持。我们使用对比学习来优化该内嵌，使其接近机器人的后继状态（响应自然嵌入其中）。HIM 具有若干吸引人的优势：它仅需机器人的本体感觉，即来自关节编码器和惯性测量单元的观测作为输入。它创新性地在仿真参考和现实之间保持了观测的一致性，从而避免了模仿学习中的信息损失。它利用批次级信息，对噪声更鲁棒且保持了更好的样本效率。它仅需在 RTX 4090 上训练 1 小时，即可使四足机器人在任何扰动下穿越任何地形。大量真实世界实验证明了其敏捷性，即使是在高难度任务和训练过程中从未出现过的情况下，也展现了卓越的开放世界泛化能力。