Occlusion is an omnipresent challenge in 3D human pose estimation (HPE). In spite of the large amount of research dedicated to 3D HPE, only a limited number of studies address the problem of occlusion explicitly. To fill this gap, we propose to combine exploitation of spatio-temporal features with synthetic occlusion augmentation during training to deal with occlusion. To this end, we build a spatio-temporal 3D HPE model, StridedPoseGraphFormer based on graph convolution and transformers, and train it using occlusion augmentation. Unlike the existing occlusion-aware methods, that are only tested for limited occlusion, we extensively evaluate our method for varying degrees of occlusion. We show that our proposed method compares favorably with the state-of-the-art (SoA). Our experimental results also reveal that in the absence of any occlusion handling mechanism, the performance of SoA 3D HPE methods degrades significantly when they encounter occlusion.
翻译:遮挡是3D人体姿态估计(HPE)中普遍存在的挑战。尽管大量研究致力于3D HPE,但仅有少数研究明确解决遮挡问题。为填补这一空白,我们提出在训练过程中将时空特征利用与合成遮挡增强相结合以处理遮挡。为此,我们构建了基于图卷积与Transformer的时空3D HPE模型StridedPoseGraphFormer,并使用遮挡增强技术进行训练。与现有仅针对有限遮挡进行测试的遮挡感知方法不同,我们对不同遮挡程度进行了广泛评估。结果表明,我们的方法相比现有最先进(SoA)方法具有优势。实验结果也揭示:在没有任何遮挡处理机制的情况下,SoA 3D HPE方法在遭遇遮挡时性能会显著下降。