Humanoid control often leverages motion priors from human demonstrations to encourage natural behaviors. However, such demonstrations are frequently suboptimal or misaligned with robotic tasks due to embodiment differences, retargeting errors, and task-irrelevant variations, causing naïve imitation to degrade task performance. Conversely, task-only reinforcement learning admits many task-optimal solutions, often resulting in unnatural or unstable motions. This exposes a fundamental limitation of linear reward mixing in adversarial imitation learning. We propose \emph{Task-Centric Motion Priors} (TCMP), a task-priority adversarial imitation framework that treats imitation as a conditional regularizer rather than a co-equal objective. TCMP maximizes task improvement while incorporating imitation signals only when they are compatible with task progress, yielding an adaptive, geometry-aware update that preserves task-feasible descent and suppresses harmful imitation under misalignment. We provide theoretical analysis of gradient conflict and task-priority stationary points, and validate our claims through humanoid control experiments demonstrating robust task performance with consistent motion style under noisy demonstrations.
翻译:人形机器人控制常利用人类演示的运动先验来促进自然行为。然而,由于本体差异、重定向误差以及与任务无关的变异,此类演示往往存在次优或与机器人任务不匹配的问题,导致简单模仿会降低任务性能。相反,仅基于任务的强化学习会产生许多任务最优解,但通常会导致不自然或不稳定的运动。这揭示了对抗模仿学习中线性奖励混合的根本局限性。我们提出\textbf{任务中心运动先验}(TCMP),这是一种任务优先的对抗模仿框架,将模仿视为条件正则化器而非同等重要的目标。TCMP在最大化任务改进的同时,仅当模仿信号与任务进展兼容时才将其纳入,从而产生一种自适应的、几何感知的更新方法,该方法能保留任务可行的下降方向,并在不匹配情况下抑制有害的模仿。我们对梯度冲突和任务优先驻点进行了理论分析,并通过人形机器人控制实验验证了我们的主张,实验表明在噪声演示下能实现稳健的任务性能并保持一贯的运动风格。