Transformer is popular in recent 3D human pose estimation, which utilizes long-term modeling to lift 2D keypoints into the 3D space. However, current transformer-based methods do not fully exploit the prior knowledge of the human skeleton provided by the kinematic structure. In this paper, we propose a novel transformer-based model EvoPose to introduce the human body prior knowledge for 3D human pose estimation effectively. Specifically, a Structural Priors Representation (SPR) module represents human priors as structural features carrying rich body patterns, e.g. joint relationships. The structural features are interacted with 2D pose sequences and help the model to achieve more informative spatiotemporal features. Moreover, a Recursive Refinement (RR) module is applied to refine the 3D pose outputs by utilizing estimated results and further injects human priors simultaneously. Extensive experiments demonstrate the effectiveness of EvoPose which achieves a new state of the art on two most popular benchmarks, Human3.6M and MPI-INF-3DHP.
翻译:Transformer凭借其长程建模能力,在近期三维人体姿态估计中被广泛用于将二维关键点提升至三维空间。然而,现有基于Transformer的方法未能充分利用运动学结构提供的人体骨架先验知识。本文提出一种新型Transformer模型EvoPose,有效引入人体先验知识用于三维姿态估计。具体而言,结构先验表示(SPR)模块将人体先验编码为携带丰富身体模式(如关节关系)的结构特征。这些结构特征与二维姿态序列交互,帮助模型获取更具信息量的时空特征。此外,递归细化(RR)模块通过利用估计结果优化三维姿态输出,并同步注入人体先验。大量实验表明,EvoPose在Human3.6M和MPI-INF-3DHP两个最主流基准上均取得最先进性能。