We present HUP-3D, a 3D multi-view multi-modal synthetic dataset for hand-ultrasound (US) probe pose estimation in the context of obstetric ultrasound. Egocentric markerless 3D joint pose estimation has potential applications in mixed reality based medical education. The ability to understand hand and probe movements programmatically opens the door to tailored guidance and mentoring applications. Our dataset consists of over 31k sets of RGB, depth and segmentation mask frames, including pose related ground truth data, with a strong emphasis on image diversity and complexity. Adopting a camera viewpoint-based sphere concept allows us to capture a variety of views and generate multiple hand grasp poses using a pre-trained network. Additionally, our approach includes a software-based image rendering concept, enhancing diversity with various hand and arm textures, lighting conditions, and background images. Furthermore, we validated our proposed dataset with state-of-the-art learning models and we obtained the lowest hand-object keypoint errors. The dataset and other details are provided with the supplementary material. The source code of our grasp generation and rendering pipeline will be made publicly available.
翻译:我们提出了HUP-3D,一个用于产科超声场景下手部-超声探头姿态估计的3D多视角多模态合成数据集。第一人称无标记3D关节姿态估计在基于混合现实的医学教育中具有潜在应用价值。以编程方式理解手部和探头运动的能力,为定制化指导和辅助应用打开了大门。我们的数据集包含超过31,000组RGB、深度和分割掩码帧,包含姿态相关的真值数据,并特别强调图像的多样性和复杂性。采用基于相机视点的球体概念,使我们能够捕捉多种视角,并利用预训练网络生成多种手部抓握姿态。此外,我们的方法包含一种基于软件的图像渲染方案,通过使用不同的手部和手臂纹理、光照条件和背景图像来增强多样性。进一步地,我们使用最先进的学习模型验证了所提出的数据集,并获得了最低的手-物体关键点误差。数据集及其他细节随补充材料提供。我们的抓握生成与渲染流程源代码将公开提供。