Distinguishing Imitation Error from Intrinsic Motion Learning Difficulty

Physics-based motion imitation is central to humanoid control, yet current evaluation metrics (e.g., MPJPE) only quantify imitation outcomes, not their underlying causes. This conflation obscures a critical diagnostic question: when imitation error occurs, does it stem from policy limitations or the intrinsic learning difficulty of the target motion? To resolve this ambiguity, we propose the Torque Variation Score (TVS), a physics-grounded metric that quantifies the inherent learning difficulty of a motion independently of any policy's performance. TVS measures the magnitude of torque variation required to correct small pose perturbations, directly capturing how dynamical properties shape the reinforcement learning landscape. We establish that high-TV motions induce flat reward landscapes and vanishing policy gradients, explaining persistent imitation failures. Extensive experiments with state-of-the-art methods (UHC, PHC+) confirm TVS strongly correlates with imitation error and enables principled error attribution: high error on low-TV motions indicates policy deficiency, while high error on high-TV motions reflects fundamental learning constraints. Beyond error diagnosis, TVS facilitates three practical applications: Maximum Imitable Difficulty (MID) for policy capability assessment, Difficulty-Stratified Joint Error (DSJE) for granular performance profiling, and Flawed Motion Detection for identifying segments with abnormally high learning difficulty to support mocap data curation and quality control. TVS provides a rigorous lens to distinguish policy-induced errors from motion-inherent challenges and enhances motion dataset reliability.

翻译：基于物理的运动模仿是人形机器人控制的核心，但当前评估指标（如MPJPE）仅量化模仿结果，而非其根本原因。这种混淆掩盖了一个关键的诊断问题：当模仿误差发生时，它源于策略局限性还是目标运动的内在学习难度？为解决这一模糊性，我们提出扭矩变化得分（TVS），这是一种基于物理的度量标准，可独立于任何策略性能量化运动固有的学习难度。TVS衡量纠正微小姿态扰动所需的扭矩变化幅度，直接捕捉动力学特性如何塑造强化学习景观。我们证明高TVS运动会引发平坦的奖励景观和消失的策略梯度，从而解释持续存在的模仿失败。采用最先进方法（UHC、PHC+）的大量实验证实，TVS与模仿误差高度相关，并能实现原则性误差归因：低TVS运动的高误差表明策略缺陷，而高TVS运动的高误差反映基本学习约束。除误差诊断外，TVS还支持三种实际应用：用于策略能力评估的最大可模仿难度（MID）、用于粒度性能分析的难度分层关节误差（DSJE），以及用于识别异常高学习难度片段以支持动作捕捉数据整理和质量控制的缺陷运动检测。TVS提供了严格视角来区分策略引发的误差与运动固有的挑战，并提升了运动数据集的可靠性。