Learning natural, stable, and compositionally generalizable whole-body control policies for humanoid robots performing simultaneous locomotion and manipulation (loco-manipulation) remains a fundamental challenge in robotics. Existing reinforcement learning approaches typically rely on a single monolithic policy to acquire multiple skills, which often leads to cross-skill gradient interference and motion pattern conflicts in high-degree-of-freedom systems. As a result, generated behaviors frequently exhibit unnatural movements, limited stability, and poor generalization to complex task compositions. To address these limitations, we propose MetaWorld-X, a hierarchical world model framework for humanoid control. Guided by a divide-and-conquer principle, our method decomposes complex control problems into a set of specialized expert policies (Specialized Expert Policies, SEP). Each expert is trained under human motion priors through imitation-constrained reinforcement learning, introducing biomechanically consistent inductive biases that ensure natural and physically plausible motion generation. Building upon this foundation, we further develop an Intelligent Routing Mechanism (IRM) supervised by a Vision-Language Model (VLM), enabling semantic-driven expert composition. The VLM-guided router dynamically integrates expert policies according to high-level task semantics, facilitating compositional generalization and adaptive execution in multi-stage loco-manipulation tasks.
翻译:学习人形机器人执行同步运动与操控任务时自然、稳定且具备组合泛化能力的全身控制策略,仍然是机器人学中的一个根本性挑战。现有的强化学习方法通常依赖单一的整体策略来学习多种技能,这在高自由度系统中常导致跨技能梯度干扰与运动模式冲突。因此,生成的行为常表现出不自然的运动、有限的稳定性以及对复杂任务组合的泛化能力差。为应对这些局限,我们提出了MetaWorld-X,一种用于人形机器人控制的分层世界模型框架。基于分治原则的指导,我们的方法将复杂控制问题分解为一组专门的专家策略。每个专家通过模仿约束的强化学习在人体运动先验下进行训练,引入了生物力学一致的归纳偏置,以确保生成自然且物理合理的运动。在此基础上,我们进一步开发了一种由视觉语言模型监督的智能路由机制,实现了语义驱动的专家策略组合。VLM引导的路由器根据高层任务语义动态整合专家策略,从而促进多阶段运动操控任务中的组合泛化与自适应执行。