SLIM：通过长时程视觉运动学习实现从仿真到现实的腿式指令操控 (SLIM: Sim-to-Real Legged Instructive Manipulation via Long-Horizon Visuomotor Learning)

We present a low-cost legged mobile manipulation system that solves long-horizon real-world tasks, trained by reinforcement learning purely in simulation. This system is made possible by 1) a hierarchical design of a high-level policy for visual-mobile manipulation following task instructions, and a low-level quadruped locomotion policy, 2) a teacher and student training pipeline for the high level, which trains a teacher to tackle long-horizon tasks using privileged task decomposition and target object information, and further trains a student for visual-mobile manipulation via RL guided by the teacher's behavior, and 3) a suite of techniques for minimizing the sim-to-real gap. In contrast to many previous works that use high-end equipments, our system demonstrates effective performance with more accessible hardware -- specifically, a Unitree Go1 quadruped, a WidowX-250S arm, and a single wrist-mounted RGB camera -- despite the increased challenges of sim-to-real transfer. Trained fully in simulation, a single policy autonomously solves long-horizon tasks involving search, move to, grasp, transport, and drop into, achieving nearly 80% real-world success. This performance is comparable to that of expert human teleoperation on the same tasks while the robot is more efficient, operating at about 1.5x the speed of the teleoperation. Finally, we perform extensive ablations on key techniques for efficient RL training and effective sim-to-real transfer, and demonstrate effective deployment across diverse indoor and outdoor scenes under various lighting conditions.

翻译：我们提出了一种低成本腿式移动操控系统，该系统通过完全在仿真环境中训练的强化学习来解决长时程现实世界任务。该系统的实现基于以下三点：1）采用分层设计，包含遵循任务指令的视觉移动操控高层策略和四足运动底层策略；2）针对高层策略的师生训练流程，首先训练教师策略利用特权任务分解和目标物体信息处理长时程任务，进而通过以教师行为为指导的强化学习训练学生策略进行视觉移动操控；3）一套用于最小化仿真到现实差距的技术集。与许多先前使用高端设备的研究不同，我们的系统在更具可及性的硬件上（具体为Unitree Go1四足机器人、WidowX-250S机械臂和单个腕部安装的RGB相机）展现出有效性能，尽管仿真到现实的迁移面临更大挑战。完全在仿真中训练得到的单一策略能够自主完成涉及搜索、移动至、抓取、运输和投放的长时程任务，在现实世界中达到近80%的成功率。该性能与人类专家在相同任务上的遥操作表现相当，同时机器人操作效率更高，运行速度约为遥操作的1.5倍。最后，我们对高效强化学习训练和有效仿真到现实迁移的关键技术进行了广泛消融实验，并展示了在不同光照条件下多种室内外场景中的有效部署。