We introduce a novel one-stage end-to-end multi-person 2D pose estimation algorithm, known as Joint Coordinate Regression and Association (JCRA), that produces human pose joints and associations without requiring any post-processing. The proposed algorithm is fast, accurate, effective, and simple. The one-stage end-to-end network architecture significantly improves the inference speed of JCRA. Meanwhile, we devised a symmetric network structure for both the encoder and decoder, which ensures high accuracy in identifying keypoints. It follows an architecture that directly outputs part positions via a transformer network, resulting in a significant improvement in performance. Extensive experiments on the MS COCO and CrowdPose benchmarks demonstrate that JCRA outperforms state-of-the-art approaches in both accuracy and efficiency. Moreover, JCRA demonstrates 69.2 mAP and is 78\% faster at inference acceleration than previous state-of-the-art bottom-up algorithms. The code for this algorithm will be publicly available.
翻译:我们提出了一种新颖的一阶段端到端多人二维姿态估计算法,称为联合坐标回归与关联(JCRA),该算法无需任何后处理即可生成人体姿态关键点及其关联。所提出的算法快速、准确、有效且简洁。一阶段端到端网络架构显著提升了JCRA的推理速度。同时,我们为编码器和解码器设计了对称网络结构,确保了关键点识别的高精度。该算法遵循一种通过Transformer网络直接输出部位位置的架构,从而在性能上实现显著提升。在MS COCO和CrowdPose基准上的大量实验表明,JCRA在准确性和效率方面均优于最先进的方法。此外,JCRA实现了69.2的mAP,并且相比之前最先进的自底向上算法,推理加速提升78%。该算法的代码将公开发布。