Current approaches for humanoid whole-body manipulation, primarily relying on teleoperation or visual sim-to-real reinforcement learning, are hindered by hardware logistics and complex reward engineering. Consequently, demonstrated autonomous skills remain limited and are typically restricted to controlled environments. In this paper, we present the Humanoid Manipulation Interface (HuMI), a portable and efficient framework for learning diverse whole-body manipulation tasks across various environments. HuMI enables robot-free data collection by capturing rich whole-body motion using portable hardware. This data drives a hierarchical learning pipeline that translates human motions into dexterous and feasible humanoid skills. Extensive experiments across five whole-body tasks--including kneeling, squatting, tossing, walking, and bimanual manipulation--demonstrate that HuMI achieves a 3x increase in data collection efficiency compared to teleoperation and attains a 70% success rate in unseen environments.
翻译:当前的人形机器人全身操作方法主要依赖于遥操作或视觉仿真到现实的强化学习,这些方法受到硬件部署复杂性和奖励函数设计的制约。因此,已展示的自主技能仍然有限,且通常局限于受控环境。本文提出人形机器人操作界面(HuMI),这是一个便携且高效的框架,用于在不同环境中学习多样化的全身操作任务。HuMI通过便携硬件捕捉丰富的全身运动,实现了无需机器人参与的数据采集。这些数据驱动一个分层学习流程,将人体运动转化为灵巧且可行的人形机器人技能。在五项全身任务(包括跪姿、蹲姿、抛掷、行走和双手操作)上的大量实验表明,HuMI的数据采集效率相比遥操作提高了3倍,并在未见环境中达到了70%的成功率。