Despite the significant progress that depth-based 3D hand pose estimation methods have made in recent years, they still require a large amount of labeled training data to achieve high accuracy. However, collecting such data is both costly and time-consuming. To tackle this issue, we propose a semi-supervised method to significantly reduce the dependence on labeled training data. The proposed method consists of two identical networks trained jointly: a teacher network and a student network. The teacher network is trained using both the available labeled and unlabeled samples. It leverages the unlabeled samples via a loss formulation that encourages estimation equivariance under a set of affine transformations. The student network is trained using the unlabeled samples with their pseudo-labels provided by the teacher network. For inference at test time, only the student network is used. Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art semi-supervised methods by large margins.
翻译:尽管基于深度的3D手部姿态估计方法近年来取得了显著进展,但它们仍需大量标注训练数据才能实现高精度。然而,收集此类数据成本高昂且耗时。为解决该问题,我们提出一种半监督方法,显著降低对标注训练数据的依赖。该方法由两个同步训练的相同网络组成:教师网络与学生网络。教师网络利用可用标注样本与无标注样本进行训练,通过一种鼓励在仿射变换集合下保持估计等变性的损失函数,从无标注样本中学习。学生网络则利用教师网络提供的伪标签对无标注样本进行训练。在测试推理阶段,仅使用学生网络。大量实验表明,所提方法在性能上远超当前最先进的半监督方法。