We present a scalable framework for cross-embodiment humanoid robot control by learning a shared latent representation that unifies motion across humans and diverse humanoid platforms, including single-arm, dual-arm, and legged humanoid robots. Our method proceeds in two stages: first, we construct a decoupled latent space that captures localized motion patterns across different body parts using contrastive learning, enabling accurate and flexible motion retargeting even across robots with diverse morphologies. To enhance alignment between embodiments, we introduce tailored similarity metrics that combine joint rotation and end-effector positioning for critical segments, such as arms. Then, we train a goal-conditioned control policy directly within this latent space using only human data. Leveraging a conditional variational autoencoder, our policy learns to predict latent space displacements guided by intended goal directions. We show that the trained policy can be directly deployed on multiple robots without any adaptation. Furthermore, our method supports the efficient addition of new robots to the latent space by learning only a lightweight, robot-specific embedding layer. The learned latent policies can also be directly applied to the new robots. Experimental results demonstrate that our approach enables robust, scalable, and embodiment-agnostic robot control across a wide range of humanoid platforms.
翻译:我们提出了一种可扩展的跨形态人形机器人控制框架,通过学习一个共享的潜在表征来统一人类与多样化人形平台(包括单臂、双臂及足式人形机器人)的运动。我们的方法分为两个阶段:首先,我们通过对比学习构建解耦的潜在空间,以捕捉不同身体部位的局部运动模式,从而即使在形态各异的机器人之间也能实现精确灵活的运动重定向。为增强不同形态间的对齐,我们针对关键部位(如手臂)引入了结合关节旋转与末端执行器定位的定制化相似性度量。随后,我们仅使用人类数据在此潜在空间中直接训练目标条件控制策略。借助条件变分自编码器,我们的策略学习在预期目标方向引导下预测潜在空间位移。研究表明,训练完成的策略无需任何调整即可直接部署于多种机器人。此外,我们的方法支持通过仅学习轻量级的机器人专用嵌入层,将新机器人高效纳入潜在空间。习得的潜在策略亦可直接应用于新机器人。实验结果证明,我们的方法能够在广泛的人形机器人平台上实现鲁棒、可扩展且与具体形态无关的机器人控制。