For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead propose a compact, explicit interface that is intuitive, general, modular, and expressive enough for diverse loco-manipulation skills. To this end, we introduce HANDOFF, a single humanoid whole-body controller that follows this interface and is distilled via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student from three complementary specialists: whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery. On the Unitree G1, HANDOFF matches state-of-the-art velocity tracking and offers one of the largest robust manipulation workspaces. We further demonstrate hardware feasibility through multiple natural-language-driven task roll-outs, powered by a VLM-driven agentic planner with no task-specific data or controller fine-tuning.
翻译:为使双足人形机器人能够部署于真实世界,指令空间(即任务规划与全身控制之间的接口)的选择至关重要。现有全身控制器通常需要密集的运动学或空间参考信息,而任务规划器难以从任务语义中合成此类参考。我们提出一种紧凑、显式的接口,其具备直观性、通用性、模块化及充分表达能力,可支持多样化的移动操作技能。为此,我们引入HANDOFF——一个遵循该接口的单一双足人形全身控制器,通过多教师KL散度蒸馏与上下文条件门控机制,将三个互补专家(基于安全滤波数据的全身运动跟踪、运动控制及跌倒恢复)的知识蒸馏至混合专家学生模型中。在宇树G1平台上,HANDOFF实现了与最先进的线速度跟踪精度相当的性能,并提供了当前最大的鲁棒操作工作空间之一。我们进一步通过多个自然语言驱动的任务执行演示,验证了其硬件可行性——该过程由视觉语言模型驱动的智能体规划器实现,无需任务特定数据或控制器微调。