Motion understanding aims to establish a reliable mapping between motion and action semantics, while it is a challenging many-to-many problem. An abstract action semantic (i.e., walk forwards) could be conveyed by perceptually diverse motions (walking with arms up or swinging). In contrast, a motion could carry different semantics w.r.t. its context and intention. This makes an elegant mapping between them difficult. Previous attempts adopted direct-mapping paradigms with limited reliability. Also, current automatic metrics fail to provide reliable assessments of the consistency between motions and action semantics. We identify the source of these problems as the significant gap between the two modalities. To alleviate this gap, we propose Kinematic Phrases (KP) that take the objective kinematic facts of human motion with proper abstraction, interpretability, and generality. Based on KP, we can unify a motion knowledge base and build a motion understanding system. Meanwhile, KP can be automatically converted from motions to text descriptions with no subjective bias, inspiring Kinematic Prompt Generation (KPG) as a novel white-box motion generation benchmark. In extensive experiments, our approach shows superiority over other methods. Our project is available at https://foruck.github.io/KP/.
翻译:运动理解旨在建立运动与动作语义之间的可靠映射,然而这是一个具有挑战性的多对多问题。一个抽象的动作语义(例如“向前行走”)可以通过感知上差异显著的运动来传达(例如手臂上举行走或摆臂行走)。反之,同一段运动可能因其上下文和意图的不同而承载不同的语义。这使得在二者之间建立精确的映射变得困难。先前的研究尝试采用直接映射范式,但其可靠性有限。同时,现有的自动评估指标无法对运动与动作语义之间的一致性提供可靠评价。我们认为这些问题的根源在于两种模态之间存在显著鸿沟。为缓解这一差距,我们提出了运动短语(Kinematic Phrases, KP),它以适当的抽象性、可解释性和普适性来捕捉人体运动的客观运动学事实。基于KP,我们可以构建统一的运动知识库并开发运动理解系统。同时,KP能够以无主观偏差的方式自动从运动数据转换为文本描述,这启发了我们提出运动提示生成(Kinematic Prompt Generation, KPG)作为一种新颖的白盒运动生成基准测试。在大量实验中,我们的方法展现出优于其他方法的性能。项目地址为 https://foruck.github.io/KP/。