Estimating 3D hand mesh from RGB images is a longstanding track, in which occlusion is one of the most challenging problems. Existing attempts towards this task often fail when the occlusion dominates the image space. In this paper, we propose SiMA-Hand, aiming to boost the mesh reconstruction performance by Single-to-Multi-view Adaptation. First, we design a multi-view hand reconstructor to fuse information across multiple views by holistically adopting feature fusion at image, joint, and vertex levels. Then, we introduce a single-view hand reconstructor equipped with SiMA. Though taking only one view as input at inference, the shape and orientation features in the single-view reconstructor can be enriched by learning non-occluded knowledge from the extra views at training, enhancing the reconstruction precision on the occluded regions. We conduct experiments on the Dex-YCB and HanCo benchmarks with challenging object- and self-caused occlusion cases, manifesting that SiMA-Hand consistently achieves superior performance over the state of the arts. Code will be released on https://github.com/JoyboyWang/SiMA-Hand Pytorch.
翻译:从RGB图像估计3D手部网格是一个长期的研究方向,其中遮挡是最具挑战性的问题之一。现有针对该任务的方法在遮挡占据图像主导区域时常常失效。本文提出SiMA-Hand,旨在通过单视图到多视图自适应(Single-to-Multi-view Adaptation)提升网格重建性能。首先,我们设计了一个多视图手部重构器,通过全局性地在图像、关节点和顶点层面融合特征,实现多视角信息融合。随后,我们引入配备SiMA的单视图手部重构器。尽管推理时仅输入单视角,该重构器在训练阶段通过从额外视角学习无遮挡知识,可增强形状与方向特征,从而提升对遮挡区域的重建精度。我们在Dex-YCB和HanCo基准数据集上进行了实验,涵盖物体和自身遮挡的具有挑战性的情况,结果表明SiMA-Hand始终优于现有最先进方法。代码将发布于https://github.com/JoyboyWang/SiMA-Hand Pytorch。