Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response

from arxiv, Use 1 hour to train a quadruped robot capable of traversing any terrain under any disturbances in the open world, Project Page: https://github.com/OpenRobotLab/HIMLoco

Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introduce Hybrid Internal Model (HIM) to estimate them according to the response of the robot. The response, which we refer to as the hybrid internal embedding, contains the robot's explicit velocity and implicit stability representation, corresponding to two primary goals for locomotion tasks: explicitly tracking velocity and implicitly maintaining stability. We use contrastive learning to optimize the embedding to be close to the robot's successor state, in which the response is naturally embedded. HIM has several appealing benefits: It only needs the robot's proprioceptions, i.e., those from joint encoders and IMU as observations. It innovatively maintains consistent observations between simulation reference and reality that avoids information loss in mimicking learning. It exploits batch-level information that is more robust to noises and keeps better sample efficiency. It only requires 1 hour of training on an RTX 4090 to enable a quadruped robot to traverse any terrain under any disturbances. A wealth of real-world experiments demonstrates its agility, even in high-difficulty tasks and cases never occurred during the training process, revealing remarkable open-world generalizability.

翻译：鲁棒的运动控制依赖于精确的状态估计。然而，大多数腿式机器人的传感器仅能提供部分且含噪声的观测，这使得状态估计尤为困难，尤其是对外部状态（如地形摩擦力和高程图）的估计。受经典内模控制原理启发，我们将这些外部状态视为扰动，并引入混合内模（Hybrid Internal Model, HIM）根据机器人的响应来估计它们。该响应——我们称之为混合内嵌——包含了机器人的显式速度和隐式稳定表征，分别对应运动任务的两个主要目标：显式跟踪速度和隐式维持稳定。我们使用对比学习优化该内嵌，使其接近机器人的后继状态（其中自然嵌入了响应）。HIM具有若干吸引人的优势：它仅需机器人的本体感知——即关节编码器和惯性测量单元的观测值作为输入；创新性地保持仿真参考与现实之间观测的一致性，避免了模仿学习中的信息损失；利用批次级信息，对噪声更鲁棒并保持更好的样本效率；在RTX 4090上仅需1小时训练即可使四足机器人在任意干扰下穿越任意地形。大量真实世界实验展示了其敏捷性，即便在高难度任务和训练过程中从未出现的情况下，也展现出卓越的开放世界泛化能力。