We present Universal Manipulation Interface (UMI) -- a data collection and policy learning framework that allows direct skill transfer from in-the-wild human demonstrations to deployable robot policies. UMI employs hand-held grippers coupled with careful interface design to enable portable, low-cost, and information-rich data collection for challenging bimanual and dynamic manipulation demonstrations. To facilitate deployable policy learning, UMI incorporates a carefully designed policy interface with inference-time latency matching and a relative-trajectory action representation. The resulting learned policies are hardware-agnostic and deployable across multiple robot platforms. Equipped with these features, UMI framework unlocks new robot manipulation capabilities, allowing zero-shot generalizable dynamic, bimanual, precise, and long-horizon behaviors, by only changing the training data for each task. We demonstrate UMI's versatility and efficacy with comprehensive real-world experiments, where policies learned via UMI zero-shot generalize to novel environments and objects when trained on diverse human demonstrations. UMI's hardware and software system is open-sourced at https://umi-gripper.github.io.
翻译:我们提出通用操控接口(UMI)——一种允许从野外人类演示直接向可部署机器人策略进行技能迁移的数据采集与策略学习框架。UMI采用手持夹持器并结合精心设计的接口,实现便携、低成本且信息丰富的双手机器人及动态操控演示数据采集。为促进可部署策略学习,UMI整合了具有推理时延匹配功能的策略接口与相对轨迹动作表征。由此习得的策略具备硬件无关性,可跨多种机器人平台部署。基于这些特性,UMI框架解锁了新的机器人操控能力,仅需改变各任务训练数据即可实现零样本泛化的动态、双手、精确及长时域行为。通过全面的真实世界实验验证了UMI的普适性与有效性——当基于多样化人类演示数据进行训练时,通过UMI习得的策略可零样本泛化至陌生环境与物体。UMI软硬件系统已在https://umi-gripper.github.io 开源。