Robotic foundation models trained on large-scale manipulation datasets have shown promise in learning generalist policies, but they often overfit to specific viewpoints, robot arms, and especially parallel-jaw grippers due to dataset biases. To address this limitation, we propose Cross-Embodiment Interface (\CEI), a framework for cross-embodiment learning that enables the transfer of demonstrations across different robot arm and end-effector morphologies. \CEI introduces the concept of \textit{functional similarity}, which is quantified using Directional Chamfer Distance. Then it aligns robot trajectories through gradient-based optimization, followed by synthesizing observations and actions for unseen robot arms and end-effectors. In experiments, \CEI transfers data and policies from a Franka Panda robot to \textbf{16} different embodiments across \textbf{3} tasks in simulation, and supports bidirectional transfer between a UR5+AG95 gripper robot and a UR5+Xhand robot across \textbf{6} real-world tasks, achieving an average transfer ratio of 82.4\%. Finally, we demonstrate that \CEI can also be extended with spatial generalization and multimodal motion generation capabilities using our proposed techniques. Project website: https://cross-embodiment-interface.github.io/
翻译:在大规模操作数据集上训练的机器人基础模型已展现出学习通用策略的潜力,但由于数据集的偏差,它们常常过度适应特定的视角、机械臂,尤其是平行夹爪。为应对这一局限,我们提出了跨形态接口(CEI),这是一个用于跨形态学习的框架,能够实现不同机械臂和末端执行器形态之间的演示迁移。CEI引入了“功能相似性”的概念,并使用定向Chamfer距离对其进行量化。随后,它通过基于梯度的优化来对齐机器人轨迹,进而为未见过的机械臂和末端执行器合成观测和动作。在实验中,CEI在仿真环境下将数据和策略从Franka Panda机器人迁移到涵盖3个任务的16种不同形态,并在真实世界中支持UR5+AG95夹爪机器人与UR5+Xhand机器人之间跨越6个任务的双向迁移,平均迁移率达到82.4%。最后,我们通过所提出的技术证明,CEI还能扩展具备空间泛化能力和多模态运动生成能力。项目网站:https://cross-embodiment-interface.github.io/