Humanoid loco-manipulation requires executing precise manipulation tasks while maintaining dynamic stability amid base motion and impacts. Existing approaches typically formulate commands in body-centric frames, fail to inherently correct cumulative world-frame drift induced by legged locomotion. We reformulate the problem as world-frame end-effector tracking and propose HiWET, a hierarchical reinforcement learning framework that decouples global reasoning from dynamic execution. The high-level policy generates subgoals that jointly optimize end-effector accuracy and base positioning in the world frame, while the low-level policy executes these commands under stability constraints. We introduce a Kinematic Manifold Prior (KMP) that embeds the manipulation manifold into the action space via residual learning, reducing exploration dimensionality and mitigating kinematically invalid behaviors. Extensive simulation and ablation studies demonstrate that HiWET achieves precise and stable end-effector tracking in long-horizon world-frame tasks. We validate zero-shot sim-to-real transfer of the low-level policy on a physical humanoid, demonstrating stable locomotion under diverse manipulation commands. These results indicate that explicit world-frame reasoning combined with hierarchical control provides an effective and scalable solution for long-horizon humanoid loco-manipulation.
翻译:人形机器人的移动操作需要在执行精确操作任务的同时,在基座运动和冲击下保持动态稳定性。现有方法通常在机体坐标系中规划指令,无法从根本上校正由足式运动引起的世界坐标系累积漂移。我们将该问题重新表述为世界坐标系下的末端执行器跟踪,并提出了HiWET——一个将全局推理与动态执行解耦的分层强化学习框架。高层策略生成子目标,在世界坐标系中联合优化末端执行器精度与基座定位;而低层策略则在稳定性约束下执行这些指令。我们引入了一种运动学流形先验,通过残差学习将操作流形嵌入到动作空间中,从而降低探索维度并抑制运动学上无效的行为。大量的仿真与消融研究表明,HiWET在长时程世界坐标系任务中实现了精确且稳定的末端执行器跟踪。我们在物理人形机器人上验证了低层策略的零样本仿真到现实迁移能力,展示了其在多样化操作指令下的稳定运动能力。这些结果表明,显式的世界坐标系推理结合分层控制,为长时程人形机器人移动操作提供了一个有效且可扩展的解决方案。