The goal of motion understanding is to establish a reliable mapping between motion and action semantics, while it is a challenging many-to-many problem. An abstract action semantic (i.e., walk forwards) could be conveyed by perceptually diverse motions (walk with arms up or swinging), while a motion could carry different semantics w.r.t. its context and intention. This makes an elegant mapping between them difficult. Previous attempts adopted direct-mapping paradigms with limited reliability. Also, current automatic metrics fail to provide reliable assessments of the consistency between motions and action semantics. We identify the source of these problems as the significant gap between the two modalities. To alleviate this gap, we propose Kinematic Phrases (KP) that take the objective kinematic facts of human motion with proper abstraction, interpretability, and generality characteristics. Based on KP as a mediator, we can unify a motion knowledge base and build a motion understanding system. Meanwhile, KP can be automatically converted from motions and to text descriptions with no subjective bias, inspiring Kinematic Prompt Generation (KPG) as a novel automatic motion generation benchmark. In extensive experiments, our approach shows superiority over other methods. Our code and data would be made publicly available at https://foruck.github.io/KP.
翻译:运动理解的目标是建立运动与动作语义间的可靠映射,但这本质上是具有挑战性的多对多问题。抽象的动作语义(如"向前行走")可通过感知上多样的运动形式(如举臂行走或摆臂行走)来表达,而同一种运动形式则可能根据其上下文和意图承载不同语义。这种特性使得两者间的优雅映射难以实现。以往方法采用直接映射范式,但可靠性有限。此外,当前自动评估指标无法可靠衡量运动与动作语义间的一致性。我们识别出这些问题的根源在于两种模态之间存在显著鸿沟。为缓解这一鸿沟,我们提出"运动学短语"(Kinematic Phrases, KP),它从人类运动的客观运动学事实出发,兼具适当抽象性、可解释性与泛化性。基于KP作为中介,我们能够统一运动知识库并构建运动理解系统。同时,KP可自动从运动转换而来并生成无主观偏差的文本描述,由此催生了新型自动运动生成基准——运动学提示生成(KPG)。大量实验表明,本方法显著优于其他方案。我们的代码与数据将在https://foruck.github.io/KP 公开。