Current parametric models have made notable progress in 3D hand pose and shape estimation. However, due to the fixed hand topology and complex hand poses, current models are hard to generate meshes that are aligned with the image well. To tackle this issue, we introduce a dual noise estimation method in this paper. Given a single-view image as input, we first adopt a baseline parametric regressor to obtain the coarse hand meshes. We assume the mesh vertices and their image-plane projections are noisy, and can be associated in a unified probabilistic model. We then learn the distributions of noise to refine mesh vertices and their projections. The refined vertices are further utilized to refine camera parameters in a closed-form manner. Consequently, our method obtains well-aligned and high-quality 3D hand meshes. Extensive experiments on the large-scale Interhand2.6M dataset demonstrate that the proposed method not only improves the performance of its baseline by more than 10$\%$ but also achieves state-of-the-art performance. Project page: \url{https://github.com/hanhuili/DNE4Hand}.
翻译:当前参数化模型在三维手部姿态与形状估计方面取得了显著进展。然而,由于固定手部拓扑结构和复杂手部姿态的影响,现有模型难以生成与图像良好对齐的网格。为解决该问题,本文提出一种双噪声估计方法。以单视图图像为输入,我们首先采用基线参数化回归器获取粗略手部网格,假设网格顶点及其图像平面投影存在噪声,并可通过统一概率模型进行关联。随后学习噪声分布以优化网格顶点及其投影,并利用优化后的顶点以闭式方式进一步优化相机参数。最终,我们的方法获得了高对齐质量的三维手部网格。在大规模Interhand2.6M数据集上的大量实验表明,所提方法不仅使基线性能提升超过10%,同时达到了最先进水平。项目页面:\url{https://github.com/hanhuili/DNE4Hand}。