Human Pose Estimation (HPE) plays a crucial role in computer vision applications. However, it is difficult to deploy state-of-the-art models on resouce-limited devices due to the high computational costs of the networks. In this work, a binary human pose estimator named BiHRNet(Binary HRNet) is proposed, whose weights and activations are expressed as $\pm$1. BiHRNet retains the keypoint extraction ability of HRNet, while using fewer computing resources by adapting binary neural network (BNN). In order to reduce the accuracy drop caused by network binarization, two categories of techniques are proposed in this work. For optimizing the training process for binary pose estimator, we propose a new loss function combining KL divergence loss with AWing loss, which makes the binary network obtain more comprehensive output distribution from its real-valued counterpart to reduce information loss caused by binarization. For designing more binarization-friendly structures, we propose a new information reconstruction bottleneck called IR Bottleneck to retain more information in the initial stage of the network. In addition, we also propose a multi-scale basic block called MS-Block for information retention. Our work has less computation cost with few precision drop. Experimental results demonstrate that BiHRNet achieves a PCKh of 87.9 on the MPII dataset, which outperforms all binary pose estimation networks. On the challenging of COCO dataset, the proposed method enables the binary neural network to achieve 70.8 mAP, which is better than most tested lightweight full-precision networks.
翻译:人体姿态估计(HPE)在计算机视觉应用中扮演着关键角色。然而,由于网络的高计算成本,在资源受限设备上部署最先进模型存在困难。本文提出了一种名为BiHRNet(二值HRNet)的二值人体姿态估计器,其权重和激活值均表示为±1。BiHRNet保留了HRNet的关键点提取能力,同时通过引入二值神经网络(BNN)降低了计算资源消耗。为减少网络二值化导致的精度下降,本文提出了两类技术:在优化二值姿态估计器训练过程方面,我们设计了一种结合KL散度损失与AWing损失的新损失函数,使二值网络能从其全精度对应模型中获取更全面的输出分布,从而减少二值化造成的信息损失;在设计更利于二值化的网络结构方面,我们提出了一种名为IR Bottleneck的信息重构瓶颈模块,旨在保留网络初始阶段的信息。此外,我们还提出了用于信息保留的多尺度基础模块MS-Block。本文方法在计算成本极低的同时实现了极小的精度损失。实验结果表明,BiHRNet在MPII数据集上达到87.9的PCKh指标,优于所有二值姿态估计网络。在更具挑战性的COCO数据集上,所提方法使二值神经网络达到70.8 mAP,优于多数经测试的轻量级全精度网络。