Recent 2D-to-3D human pose estimation (HPE) utilizes temporal consistency across sequences to alleviate the depth ambiguity problem but ignore the action related prior knowledge hidden in the pose sequence. In this paper, we propose a plug-and-play module named Action Prompt Module (APM) that effectively mines different kinds of action clues for 3D HPE. The highlight is that, the mining scheme of APM can be widely adapted to different frameworks and bring consistent benefits. Specifically, we first present a novel Action-related Text Prompt module (ATP) that directly embeds action labels and transfers the rich language information in the label to the pose sequence. Besides, we further introduce Action-specific Pose Prompt module (APP) to mine the position-aware pose pattern of each action, and exploit the correlation between the mined patterns and input pose sequence for further pose refinement. Experiments show that APM can improve the performance of most video-based 2D-to-3D HPE frameworks by a large margin.
翻译:摘要:近期,从二维到三维的人体姿态估计方法利用序列间的时间一致性来缓解深度模糊问题,但忽略了隐藏于姿态序列中与动作相关的先验知识。本文提出一种即插即用模块,名为动作提示模块(APM),该模块能有效挖掘不同类型的三维人体姿态估计动作线索。其亮点在于,APM的挖掘方案可广泛适配不同框架并带来持续的性能提升。具体而言,我们首先提出一种新型的动作相关文本提示模块(ATP),该模块直接嵌入动作标签,并将标签中丰富的语言信息传递至姿态序列。此外,我们进一步引入动作特定姿态提示模块(APP),用于挖掘每种动作的位置感知姿态模式,并利用所挖掘模式与输入姿态序列之间的相关性进行姿态精细化。实验表明,APM能大幅提升大多数基于视频的二维到三维人体姿态估计框架的性能。