We present WildHands, a method for 3D hand pose estimation in egocentric images in the wild. This is challenging due to (a) lack of 3D hand pose annotations for images in the wild, and (b) a form of perspective distortion-induced shape ambiguity that arises in the analysis of crops around hands. For the former, we use auxiliary supervision on in-the-wild data in the form of segmentation masks & grasp labels in addition to 3D supervision available in lab datasets. For the latter, we provide spatial cues about the location of the hand crop in the camera's field of view. Our approach achieves the best 3D hand pose on the ARCTIC leaderboard and outperforms FrankMocap, a popular and robust approach for estimating hand pose in the wild, by 45.3% when evaluated on 2D hand pose on our EPIC-HandKps dataset.
翻译:我们提出WildHands方法,用于在野外场景下以自我为中心的图像中实现3D手部姿态估计。该任务面临两大挑战:(a)野外图像缺乏3D手部姿态标注数据,以及(b)分析手部裁剪区域时产生的透视畸变导致的形状歧义。针对前者,我们除利用实验室数据集提供的3D监督信息外,还通过分割掩码和抓取标签对野外数据进行辅助监督。针对后者,我们提供手部裁剪区域在相机视野中位置的空间线索。本方法在ARCTIC排行榜上实现了最优3D手部姿态估计效果,并且在我们的EPIC-HandKps数据集上评估2D手部姿态时,相比广泛应用的稳健野外手部姿态估计方法FrankMocap提升了45.3%。