SLIM: Sim-to-Real Legged Instructive Manipulation via Long-Horizon Visuomotor Learning

We present a low-cost legged mobile manipulation system that solves long-horizon real-world tasks, trained by reinforcement learning purely in simulation. This system is made possible by 1) a hierarchical design of a high-level policy for visual-mobile manipulation following instructions and a low-level policy for quadruped movement and limb control, 2) a progressive exploration and learning approach that leverages privileged task decomposition information to train the teacher policy for long-horizon tasks, which will guide an imitation-based student policy for efficient training of the high-level visuomotor policy, and 3) a suite of techniques for minimizing sim-to-real gaps. In contrast to previous approaches that use high-end equipment, our system demonstrates effective performance with more accessible hardware - specifically, a Unitree Go1 quadruped, a WidowX250S arm, and a single wrist-mounted RGB camera - despite the increased challenges of sim-to-real transfer. When fully trained in simulation, a single policy autonomously solves long-horizon tasks such as search, move, grasp, and drop-into, achieving nearly 80% success. This performance is comparable to that of expert human teleoperation on the same tasks but significantly more efficient, operating at about 1.5x the speed. The sim-to-real transfer is fluid across diverse indoor and outdoor scenes under varying lighting conditions. Finally, we discuss the key techniques that enable the entire pipeline, including efficient RL training and sim-to-real, to work effectively for legged mobile manipulation, and present their ablation results.

翻译：我们提出了一种低成本的腿式移动操控系统，该系统通过纯仿真环境中的强化学习训练，能够解决长时程的现实世界任务。该系统的实现依赖于以下三点：1）采用分层设计，包含一个遵循指令执行视觉移动操控的高层策略，以及一个负责四足运动与肢体控制的底层策略；2）一种渐进式探索与学习方法，该方法利用特权任务分解信息来训练面向长时程任务的教师策略，该教师策略将指导基于模仿的学生策略，从而高效训练高层视觉运动策略；3）一套用于最小化仿真到现实差距的技术。与以往使用高端设备的方法不同，我们的系统在采用更易获取的硬件（具体为Unitree Go1四足机器人、WidowX250S机械臂和单个腕部安装的RGB相机）的情况下，尽管面临更大的仿真到现实迁移挑战，仍展现出有效的性能。在仿真环境中完全训练后，单一策略能够自主完成搜索、移动、抓取和放入等长时程任务，成功率接近80%。该性能与人类专家在相同任务上的遥操作表现相当，但效率显著更高，运行速度约为1.5倍。仿真到现实的迁移在不同光照条件下的多样室内外场景中均能流畅实现。最后，我们讨论了使整个流程（包括高效强化学习训练和仿真到现实迁移）在腿式移动操控中有效运行的关键技术，并展示了其消融实验结果。