DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets

This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged robot locomotion, especially with multiple skills in a single policy, presents significant challenges for prior online reinforcement learning methods. To address this challenge, we propose a novel, scalable framework that leverages diffusion models to directly learn from offline multimodal datasets with a diverse set of locomotion skills. With design choices tailored for real-time control in dynamical systems, including receding horizon control and delayed inputs, DiffuseLoco is capable of reproducing multimodality in performing various locomotion skills, zero-shot transfer to real quadrupedal robots, and it can be deployed on edge computing devices. Furthermore, DiffuseLoco demonstrates free transitions between skills and robustness against environmental variations. Through extensive benchmarking in real-world experiments, DiffuseLoco exhibits better stability and velocity tracking performance compared to prior reinforcement learning and non-diffusion-based behavior cloning baselines. The design choices are validated via comprehensive ablation studies. This work opens new possibilities for scaling up learning-based legged locomotion controllers through the scaling of large, expressive models and diverse offline datasets.

翻译：本文提出DiffuseLoco框架，旨在从离线数据集中训练基于扩散模型的多技能动态腿部运动策略，实现真实世界中机器人多样技能的实时控制。大规模离线学习已在计算机视觉、自然语言处理和机器人操作领域取得突破性进展，但将此类学习方法扩展至足式机器人运动控制（特别是单一策略中集成多种技能）对传统的在线强化学习方法构成重大挑战。为解决该问题，本文提出一种新颖的可扩展框架，利用扩散模型直接从包含多样化运动技能的离线多模态数据集中学习。通过针对动态系统实时控制的设计选择（包括退缩时域控制与延迟输入），DiffuseLoco能够再现执行多种运动技能的多模态特性，实现零样本迁移至真实四足机器人，并部署于边缘计算设备。此外，DiffuseLoco展现了技能间的自由切换能力及对环境变化的鲁棒性。在真实世界实验的广泛基准测试中，相比传统强化学习与非扩散行为克隆基线方法，DiffuseLoco在稳定性和速度跟踪性能上表现更优。通过全面的消融研究验证了设计选择的有效性。本文为通过扩展大规模表达模型与多样化离线数据集来推动基于学习的腿部运动控制器规模化发展开辟了新路径。