Depth-based 3D hand pose estimation is an important but challenging research task in human-machine interaction community. Recently, dense regression methods have attracted increasing attention in 3D hand pose estimation task, which provide a low computational burden and high accuracy regression way by densely regressing hand joint offset maps. However, large-scale regression offset values are often affected by noise and outliers, leading to a significant drop in accuracy. To tackle this, we re-formulate 3D hand pose estimation as a dense ordinal regression problem and propose a novel Dense Ordinal Regression 3D Pose Network (DOR3D-Net). Specifically, we first decompose offset value regression into sub-tasks of binary classifications with ordinal constraints. Then, each binary classifier can predict the probability of a binary spatial relationship relative to joint, which is easier to train and yield much lower level of noise. The estimated hand joint positions are inferred by aggregating the ordinal regression results at local positions with a weighted sum. Furthermore, both joint regression loss and ordinal regression loss are used to train our DOR3D-Net in an end-to-end manner. Extensive experiments on public datasets (ICVL, MSRA, NYU and HANDS2017) show that our design provides significant improvements over SOTA methods.
翻译:基于深度图像的3D手部姿态估计是人机交互领域一项重要但具有挑战性的研究任务。近年来,密集回归方法通过密集回归手部关节偏移图,以较低的计算负担实现高精度回归,在3D手部姿态估计任务中受到越来越多的关注。然而,大规模回归偏移值常受噪声和离群点影响,导致精度显著下降。为解决此问题,我们将3D手部姿态估计重新定义为密集序数回归问题,并提出新型密集序数回归3D姿态网络(DOR3D-Net)。具体而言,我们首先将偏移值回归分解为具有序数约束的二元分类子任务。随后,每个二元分类器可预测相对于关节的二元空间关系概率,该分类器更易训练且噪声水平更低。通过加权求和聚合局部位置的序数回归结果,可推断出估计的手部关节位置。此外,我们联合使用关节回归损失和序数回归损失,以端到端方式训练DOR3D-Net。在公开数据集(ICVL、MSRA、NYU和HANDS2017)上的大量实验表明,我们的设计相比现有最先进方法有显著提升。