RGB-based 3D pose estimation methods have been successful with the development of deep learning and the emergence of high-quality 3D pose datasets. However, most existing methods do not operate well for testing images whose distribution is far from that of training data. However, most existing methods do not operate well for testing images whose distribution is far from that of training data. This problem might be alleviated by involving diverse data during training, however it is non-trivial to collect such diverse data with corresponding labels (i.e. 3D pose). In this paper, we introduced an unsupervised domain adaptation framework for 3D pose estimation that utilizes the unlabeled data in addition to labeled data via masked image modeling (MIM) framework. Foreground-centric reconstruction and attention regularization are further proposed to increase the effectiveness of unlabeled data usage. Experiments are conducted on the various datasets in human and hand pose estimation tasks, especially using the cross-domain scenario. We demonstrated the effectiveness of ours by achieving the state-of-the-art accuracy on all datasets.
翻译:基于RGB的三维姿态估计方法随着深度学习的发展及高质量三维姿态数据集的涌现已取得显著成功。然而,现有方法对于测试图像分布与训练数据分布差异较大的情况通常表现不佳。尽管在训练阶段引入多样化数据可能缓解此问题,但收集具有对应标注(即三维姿态)的多样化数据并非易事。本文提出一种用于三维姿态估计的无监督域适应框架,该框架通过掩码图像建模(MIM)机制,在已标注数据基础上进一步利用未标注数据。我们进一步提出前景中心化重建与注意力正则化方法,以提升未标注数据的利用效率。在人体与手部姿态估计任务中,我们基于多个数据集(特别是跨域场景)进行了实验验证。实验结果表明,我们的方法在所有数据集上均达到了最先进的精度水平,证明了其有效性。