Accurate tool tracking is essential for the success of computer-assisted intervention. Previous efforts often modeled tool trajectories rigidly, overlooking the dynamic nature of surgical procedures, especially tracking scenarios like out-of-body and out-of-camera views. Addressing this limitation, the new CholecTrack20 dataset provides detailed labels that account for multiple tool trajectories in three perspectives: (1) intraoperative, (2) intracorporeal, and (3) visibility, representing the different types of temporal duration of tool tracks. These fine-grained labels enhance tracking flexibility but also increase the task complexity. Re-identifying tools after occlusion or re-insertion into the body remains challenging due to high visual similarity, especially among tools of the same category. This work recognizes the critical role of the tool operators in distinguishing tool track instances, especially those belonging to the same tool category. The operators' information are however not explicitly captured in surgical videos. We therefore propose SurgiTrack, a novel deep learning method that leverages YOLOv7 for precise tool detection and employs an attention mechanism to model the originating direction of the tools, as a proxy to their operators, for tool re-identification. To handle diverse tool trajectory perspectives, SurgiTrack employs a harmonizing bipartite matching graph, minimizing conflicts and ensuring accurate tool identity association. Experimental results on CholecTrack20 demonstrate SurgiTrack's effectiveness, outperforming baselines and state-of-the-art methods with real-time inference capability. This work sets a new standard in surgical tool tracking, providing dynamic trajectories for more adaptable and precise assistance in minimally invasive surgeries.
翻译:精确的器械追踪对于计算机辅助介入手术的成功至关重要。先前的研究通常对器械轨迹进行刚性建模,忽视了手术过程的动态特性,特别是诸如器械移出体外或移出摄像机视野等追踪场景。针对这一局限,新的CholecTrack20数据集提供了详细的标注,从三个视角考虑多器械轨迹:(1)术中,(2)体内,以及(3)可见性,代表了器械轨迹的不同类型的时间持续性。这些细粒度标注增强了追踪的灵活性,但也增加了任务的复杂性。由于器械间(尤其是同一类别的器械)视觉相似度高,在遮挡后或重新插入体内后重新识别器械仍然具有挑战性。本研究认识到器械操作者在区分器械轨迹实例(尤其是属于同一器械类别的实例)中的关键作用。然而,操作者的信息并未在手术视频中明确捕获。因此,我们提出SurgiTrack,一种新颖的深度学习方法,它利用YOLOv7进行精确的器械检测,并采用注意力机制来建模器械的起源方向(作为其操作者的代理),以实现器械重识别。为了处理不同的器械轨迹视角,SurgiTrack采用了一个协调二分匹配图,以最小化冲突并确保准确的器械身份关联。在CholecTrack20上的实验结果表明了SurgiTrack的有效性,其性能优于基线方法和最先进的方法,并具备实时推理能力。这项工作为手术器械追踪设立了新标准,为微创手术提供了动态轨迹,以实现更具适应性和更精确的辅助。