In this survey, we present a systematic review of 3D hand pose estimation from the perspective of efficient annotation and learning. 3D hand pose estimation has been an important research area owing to its potential to enable various applications, such as video understanding, AR/VR, and robotics. However, the performance of models is tied to the quality and quantity of annotated 3D hand poses. Under the status quo, acquiring such annotated 3D hand poses is challenging, e.g., due to the difficulty of 3D annotation and the presence of occlusion. To reveal this problem, we review the pros and cons of existing annotation methods classified as manual, synthetic-model-based, hand-sensor-based, and computational approaches. Additionally, we examine methods for learning 3D hand poses when annotated data are scarce, including self-supervised pretraining, semi-supervised learning, and domain adaptation. Based on the study of efficient annotation and learning, we further discuss limitations and possible future directions in this field.
翻译:本综述从高效标注与学习的视角,对3D手部姿态估计进行了系统性回顾。3D手部姿态估计因在视频理解、增强现实/虚拟现实及机器人等领域的应用潜力,已成为重要研究方向。然而,模型性能与3D手部姿态标注数据的质量和数量密切相关。在现有条件下,获取此类标注数据面临诸多挑战,例如3D标注的困难性以及遮挡问题的存在。为揭示这一困境,本文梳理了现有标注方法的优劣,涵盖手动标注、基于合成模型、基于手部传感器及基于计算的方法四大类。此外,本文还探讨了标注数据匮乏时的3D手部姿态学习方法,包括自监督预训练、半监督学习及领域自适应。基于对高效标注与学习的研究,本文进一步讨论了该领域的局限性及未来可能的研究方向。