Learning Local-Global Contextual Adaptation for Fully End-to-End Bottom-Up Human Pose Estimation

This paper presents a method of learning Local-GlObal Contextual Adaptation for fully end-to-end and fast bottom-up human Pose estimation, dubbed as LOGO-CAP. It is built on the conceptually simple center-offset formulation that lacks inaccuracy for pose estimation. When revisiting the bottom-up human pose estimation with the thought of "thinking, fast and slow" by D. Kahneman, we introduce a "slow keypointer" to remedy the lack of sufficient accuracy of the "fast keypointer". In learning the "slow keypointer", the proposed LOGO-CAP lifts the initial "fast" keypoints by offset predictions to keypoint expansion maps (KEMs) to counter their uncertainty in two modules. Firstly, the local KEMs (e.g., 11x11) are extracted from a low-dimensional feature map. A proposed convolutional message passing module learns to "re-focus" the local KEMs to the keypoint attraction maps (KAMs) by accounting for the structured output prediction nature of human pose estimation, which is directly supervised by the object keypoint similarity (OKS) loss in training. Secondly, the global KEMs are extracted, with a sufficiently large region-of-interest (e.g., 97x97), from the keypoint heatmaps that are computed by a direct map-to-map regression. Then, a local-global contextual adaptation module is proposed to convolve the global KEMs using the learned KAMs as the kernels. This convolution can be understood as the learnable offsets guided deformable and dynamic convolution in a pose-sensitive way. The proposed method is end-to-end trainable with near real-time inference speed, obtaining state-of-the-art performance on the COCO keypoint benchmark for bottom-up human pose estimation. With the COCO trained model, our LOGO-CAP also outperforms prior arts by a large margin on the challenging OCHuman dataset.

翻译：本文展示了一种方法, 学习本地- Global- Gal- Obal 背景适应, 以完全端到端, 快速自下而上的人 Pose 估计, 称为 LOGO- CAP 。它建在概念上简单的中位设置配方配方配方, 且不准确, 无法做出估计。当D. Kahneman 以“ 思考、快速和慢” 的想法重新审视自下而上的人构成估计时, 我们引入了一个“ 低关键点”, 以弥补“ 快速关键点” 缺乏足够的准确性。在学习“ 低关键点” 时, 拟议的 LOGO- CAP 将初始“ 快速” 关键点提升“ 快速” 关键点, 通过抵消对关键点扩展地图( KEMs) ( KEMs) 的预测来抵消其两个模块的不确定性。首先, 本地 KEMs- dismodeal dismodeal development 。