State-of-the-art research of traditional computer vision is increasingly leveraged in the surgical domain. A particular focus in computer-assisted surgery is to replace marker-based tracking systems for instrument localization with pure image-based 6DoF pose estimation. However, the state of the art has not yet met the accuracy required for surgical navigation. In this context, we propose a high-fidelity marker-less optical tracking system for surgical instrument localization. We developed a multi-view camera setup consisting of static and mobile cameras and collected a large-scale RGB-D video dataset with dedicated synchronization and data fusions methods. Different state-of-the-art pose estimation methods were integrated into a deep learning pipeline and evaluated on multiple camera configurations. Furthermore, the performance impacts of different input modalities and camera positions, as well as training on purely synthetic data, were compared. The best model achieved an average position and orientation error of 1.3 mm and 1.0{\deg} for a surgical drill as well as 3.8 mm and 5.2{\deg} for a screwdriver. These results significantly outperform related methods in the literature and are close to clinical-grade accuracy, demonstrating that marker-less tracking of surgical instruments is becoming a feasible alternative to existing marker-based systems.
翻译:传统计算机视觉领域的前沿研究正日益被应用于外科手术场景。在计算机辅助手术中,一个核心研究方向是用纯图像驱动的6DoF位姿估计替代基于标记点的器械定位追踪系统。然而,现有技术尚未达到手术导航所需的精度要求。为此,我们提出了一套高保真无标记光学追踪系统用于手术器械定位。我们构建了由固定相机与移动相机组成的多视角摄像系统,并采用专用同步与数据融合方法采集了大规模RGB-D视频数据集。通过将多种先进位姿估计方法集成至深度学习流水线,我们在多相机配置下进行了系统评估。此外,我们对比了不同输入模态、相机位置以及纯合成数据训练对性能的影响。最优模型对手术钻具的平均位置误差与方向误差分别达到1.3mm与1.0°,对螺丝刀则分别为3.8mm与5.2°。这些结果显著优于文献中的同类方法,且接近临床级精度,表明无标记手术器械追踪正逐步成为现有标记点系统的可行替代方案。