Heatmap-based methods have become the mainstream method for pose estimation due to their superior performance. However, heatmap-based approaches suffer from significant quantization errors with downscale heatmaps, which result in limited performance and the detrimental effects of intermediate supervision. Previous heatmap-based methods relied heavily on additional post-processing to mitigate quantization errors. Some heatmap-based approaches improve the resolution of feature maps by using multiple costly upsampling layers to improve localization precision. To solve the above issues, we creatively view the backbone network as a degradation process and thus reformulate the heatmap prediction as a Super-Resolution (SR) task. We first propose the SR head, which predicts heatmaps with a spatial resolution higher than the input feature maps (or even consistent with the input image) by super-resolution, to effectively reduce the quantization error and the dependence on further post-processing. Besides, we propose SRPose to gradually recover the HR heatmaps from LR heatmaps and degraded features in a coarse-to-fine manner. To reduce the training difficulty of HR heatmaps, SRPose applies SR heads to supervise the intermediate features in each stage. In addition, the SR head is a lightweight and generic head that applies to top-down and bottom-up methods. Extensive experiments on the COCO, MPII, and CrowdPose datasets show that SRPose outperforms the corresponding heatmap-based approaches. The code and models are available at https://github.com/haonanwang0522/SRPose.
翻译:基于热图的方法因其优越的性能已成为姿态估计的主流方法。然而,热图方法在下采样热图中存在显著的量化误差,导致性能受限以及中间监督带来的不利影响。以往基于热图的方法严重依赖额外的后处理来减轻量化误差,部分方法通过使用多个昂贵的上采样层提升特征图分辨率以改善定位精度。为解决上述问题,我们创新性地将骨干网络视为退化过程,从而将热图预测重新表述为超分辨率任务。首先提出超分辨率头部,通过超分辨率技术预测空间分辨率高于输入特征图(甚至与输入图像一致)的热图,有效降低量化误差及对后续后处理的依赖。此外,我们提出SRPose,以由粗到细的方式从低分辨率热图和退化特征中逐步恢复高分辨率热图。为降低高分辨率热图的训练难度,SRPose在每一阶段应用超分辨率头部对中间特征进行监督。该超分辨率头部是轻量级且通用的头部,适用于自上而下和自下而上的方法。在COCO、MPII和CrowdPose数据集上的大量实验表明,SRPose优于对应的基于热图的方法。代码和模型已开源至https://github.com/haonanwang0522/SRPose。