The goal of motion understanding is to establish a reliable mapping between motion and action semantics, while it is a challenging many-to-many problem. An abstract action semantic (i.e., walk forwards) could be conveyed by perceptually diverse motions (walk with arms up or swinging), while a motion could carry different semantics w.r.t. its context and intention. This makes an elegant mapping between them difficult. Previous attempts adopted direct-mapping paradigms with limited reliability. Also, current automatic metrics fail to provide reliable assessments of the consistency between motions and action semantics. We identify the source of these problems as the significant gap between the two modalities. To alleviate this gap, we propose Kinematic Phrases (KP) that take the objective kinematic facts of human motion with proper abstraction, interpretability, and generality characteristics. Based on KP as a mediator, we can unify a motion knowledge base and build a motion understanding system. Meanwhile, KP can be automatically converted from motions and to text descriptions with no subjective bias, inspiring Kinematic Prompt Generation (KPG) as a novel automatic motion generation benchmark. In extensive experiments, our approach shows superiority over other methods. Our code and data would be made publicly available at https://foruck.github.io/KP.
翻译:运动理解的目标是建立运动与行为语义之间的可靠映射,但这是一个具有挑战性的多对多问题。抽象的行为语义(如向前行走)可能由感知上多样化的运动(如举臂行走或摆臂行走)来传达,而同一运动可能根据其情境和意图承载不同的语义。这使得两者之间难以建立优雅的映射关系。以往的尝试采用直接映射范式,但可靠性有限。此外,当前的自动化评估指标无法可靠地评估运动与行为语义之间的一致性。我们识别出这些问题的根源在于两种模态之间存在显著鸿沟。为缓解这一鸿沟,我们提出了运动短语(Kinematic Phrases,KP),该短语利用人体运动的客观运动学事实,具备适当的抽象性、可解释性和通用性特征。基于KP作为中介,我们可以统一运动知识库并构建运动理解系统。同时,KP可自动从运动中转换生成文本描述,且无主观偏差,由此催生了运动提示生成(Kinematic Prompt Generation,KPG)这一新型自动运动生成基准。在大量实验中,我们的方法表现出优于其他方法的性能。我们的代码和数据将在https://foruck.github.io/KP上公开。