Generalist humanoid motion trackers have recently achieved strong simulation metrics by scaling data and training, yet often remain brittle on hardware during sustained teleoperation due to interface- and dynamics-induced errors. We present MOSAIC, an open-source, full-stack system for humanoid motion tracking and whole-body teleoperation across multiple interfaces. MOSAIC first learns a teleoperation-oriented general motion tracker via RL on a multi-source motion bank with adaptive resampling and rewards that emphasize world-frame motion consistency, which is critical for mobile teleoperation. To bridge the sim-to-real interface gap without sacrificing generality, MOSAIC then performs rapid residual adaptation: an interface-specific policy is trained using minimal interface-specific data, and then distilled into the general tracker through an additive residual module, outperforming naive fine-tuning or continual learning. We validate MOSAIC with systematic ablations, out-of-distribution benchmarking, and real-robot experiments demonstrating robust offline motion replay and online long-horizon teleoperation under realistic latency and noise.
翻译:通用人形机器人运动跟踪器近期通过扩大数据规模和训练强度,在仿真指标上取得了显著成果,但在硬件上进行持续遥操作时,常因接口和动力学引发的误差而表现脆弱。本文提出MOSAIC,一个用于跨多接口人形机器人运动跟踪与全身遥操作的开源全栈系统。MOSAIC首先通过强化学习,在一个采用自适应重采样并强调世界坐标系运动一致性的多源运动数据库上,学习一个面向遥操作的通用运动跟踪器,这对于移动遥操作至关重要。为了在不牺牲通用性的前提下弥合仿真到现实的接口差距,MOSAIC随后执行快速残差适应:使用极少的接口特定数据训练一个接口特定策略,然后通过一个加性残差模块将其知识蒸馏到通用跟踪器中,其性能优于简单的微调或持续学习方法。我们通过系统的消融实验、分布外基准测试以及真实机器人实验验证了MOSAIC,实验展示了其在现实延迟和噪声条件下,鲁棒的离线运动复现和在线长时域遥操作能力。