We present Whole-Body Mobile Manipulation Interface (HoMMI), a data collection and policy learning framework that learns whole-body mobile manipulation directly from robot-free human demonstrations. We augment UMI interfaces with egocentric sensing to capture the global context required for mobile manipulation, enabling portable, robot-free, and scalable data collection. However, naively incorporating egocentric sensing introduces a larger human-to-robot embodiment gap in both observation and action spaces, making policy transfer difficult. We explicitly bridge this gap with a cross-embodiment hand-eye policy design, including an embodiment agnostic visual representation; a relaxed head action representation; and a whole-body controller that realizes hand-eye trajectories through coordinated whole-body motion under robot-specific physical constraints. Together, these enable long-horizon mobile manipulation tasks requiring bimanual and whole-body coordination, navigation, and active perception. Results are best viewed on: https://hommi-robot.github.io
翻译:本文提出全身移动操作接口(HoMMI),一种直接从无机器人参与的人类演示中学习全身移动操作的数据收集与策略学习框架。我们通过自我中心感知增强UMI接口,以捕捉移动操作所需的全局环境信息,从而实现便携、无需机器人参与且可扩展的数据收集。然而,简单引入自我中心感知会在观测空间与动作空间引入更大的人-机器人具身差异,导致策略迁移困难。我们通过跨具身手眼策略设计显式弥合这一差异,包括:具身无关的视觉表征;宽松化的头部动作表征;以及通过机器人特定物理约束下的协调全身运动实现手眼轨迹的全身控制器。这些设计共同实现了需要双手协调、全身协同、导航与主动感知的长时程移动操作任务。完整结果请参阅:https://hommi-robot.github.io