Humanoid control often leverages motion priors from human demonstrations to encourage natural behaviors. However, such demonstrations are frequently suboptimal or misaligned with robotic tasks due to embodiment differences, retargeting errors, and task-irrelevant variations, causing naïve imitation to degrade task performance. Conversely, task-only reinforcement learning admits many task-optimal solutions, often resulting in unnatural or unstable motions. This exposes a fundamental limitation of linear reward mixing in adversarial imitation learning. We propose \emph{Task-Centric Motion Priors} (TCMP), a task-priority adversarial imitation framework that treats imitation as a conditional regularizer rather than a co-equal objective. TCMP maximizes task improvement while incorporating imitation signals only when they are compatible with task progress, yielding an adaptive, geometry-aware update that preserves task-feasible descent and suppresses harmful imitation under misalignment. We provide theoretical analysis of gradient conflict and task-priority stationary points, and validate our claims through humanoid control experiments demonstrating robust task performance with consistent motion style under noisy demonstrations.
翻译:人形机器人控制常利用人类演示的运动先验来促进自然行为。然而,由于本体差异、重定向误差以及与任务无关的变异,此类演示往往存在次优或与机器人任务未对齐的问题,导致简单模仿会降低任务性能。反之,仅基于任务的强化学习会生成大量任务最优解,但通常会产生不自然或不稳定的运动。这揭示了对抗模仿学习中线性奖励混合的根本局限性。我们提出任务中心化运动先验(TCMP),这是一种任务优先的对抗模仿框架,将模仿视为条件正则化器而非同等重要的目标。TCMP在最大化任务改进的同时,仅当模仿信号与任务进展兼容时才将其纳入,从而产生一种自适应、几何感知的更新机制,既能保持任务可行的下降方向,又能在未对齐情况下抑制有害模仿。我们提供了梯度冲突与任务优先稳定点的理论分析,并通过人形机器人控制实验验证了所提方法的有效性,实验表明在噪声演示下仍能保持稳健的任务性能与一致的运动风格。