Human pose estimation is a fundamental and challenging task in computer vision. Larger-scale and more accurate keypoint annotations, while helpful for improving the accuracy of supervised pose estimation, are often expensive and difficult to obtain. Semi-supervised pose estimation tries to leverage a large amount of unlabeled data to improve model performance, which can alleviate the problem of insufficient labeled samples. The latest semi-supervised learning usually adopts a strong and weak data augmented teacher-student learning framework to deal with the challenge of "Human postural diversity and its long-tailed distribution". Appropriate data augmentation method is one of the key factors affecting the accuracy and generalization of semi-supervised models. Aiming at the problem that the difference of sample learning is not considered in the fixed keypoint masking augmentation method, this paper proposes an adaptive keypoint masking method, which can fully mine the information in the samples and obtain better estimation performance. In order to further improve the generalization and robustness of the model, this paper proposes a dual-branch data augmentation scheme, which can perform Mixup on samples and features on the basis of adaptive keypoint masking. The effectiveness of the proposed method is verified on COCO and MPII, outperforming the state-of-the-art semi-supervised pose estimation by 5.2% and 0.3%, respectively.
翻译:人体姿态估计是计算机视觉中一项基础且富有挑战性的任务。更大规模、更精确的关键点标注虽然有助于提升监督式姿态估计的准确率,但通常成本高昂且难以获取。半监督姿态估计试图利用大量未标注数据来提升模型性能,从而缓解标注样本不足的问题。最新的半监督学习通常采用强弱数据增强的师生学习框架,以应对“人体姿态多样性及其长尾分布”的挑战。合适的数据增强方法是影响半监督模型精度与泛化能力的关键因素之一。针对固定关键点掩码增强方法未考虑样本学习差异的问题,本文提出一种自适应关键点掩码方法,该方法能够充分挖掘样本中的信息,从而获得更优的估计性能。为进一步提升模型的泛化能力与鲁棒性,本文提出一种双分支数据增强方案,该方案可在自适应关键点掩码的基础上,对样本与特征执行混合增强(Mixup)。在COCO与MPII数据集上的实验结果验证了所提方法的有效性,其性能分别领先当前最先进的半监督姿态估计方法5.2%和0.3%。