Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge

翻译：内窥镜手术阶段识别、器械关键点估计与器械实例分割的对比验证：PhaKIR 2024挑战赛结果

Tobias Rueckert,David Rauber,Raphaela Maerkl,Leonard Klausmann,Suemeyye R. Yildiran,Max Gutbrod,Danilo Weber Nunes,Alvaro Fernandez Moreno,Imanol Luengo,Danail Stoyanov,Nicolas Toussaint,Enki Cho,Hyeon Bae Kim,Oh Sung Choo,Ka Young Kim,Seong Tae Kim,Gonçalo Arantes,Kehan Song,Jianjun Zhu,Junchen Xiong,Tingyi Lin,Shunsuke Kikuchi,Hiroki Matsuzaki,Atsushi Kouno,João Renato Ribeiro Manesco,João Paulo Papa,Tae-Min Choi,Tae Kyeong Jeong,Juyoun Park,Oluwatosin Alabi,Meng Wei,Tom Vercauteren,Runzhi Wu,Mengya Xu,An Wang,Long Bai,Hongliang Ren,Amine Yamlahi,Jakob Hennighausen,Lena Maier-Hein,Satoshi Kondo,Satoshi Kasai,Kousuke Hirasawa,Shu Yang,Yihui Wang,Hao Chen,Santiago Rodríguez,Nicolás Aparicio,Leonardo Manrique,Juan Camilo Lyons,Olivia Hosie,Nicolás Ayobi,Pablo Arbeláez,Yiping Li,Yasmina Al Khalil,Sahar Nasirihaghighi,Stefanie Speidel,Daniel Rueckert,Hubertus Feussner,Dirk Wilhelm,Christoph Palm

from arxiv, A challenge report pre-print accepted by the journal Medical Image Analysis (MedIA), containing 37 pages, 15 figures, and 14 tables

Reliable recognition and localization of surgical instruments in endoscopic video recordings are foundational for a wide range of applications in computer- and robot-assisted minimally invasive surgery (RAMIS), including surgical training, skill assessment, and autonomous assistance. However, robust performance under real-world conditions remains a significant challenge. Incorporating surgical context - such as the current procedural phase - has emerged as a promising strategy to improve robustness and interpretability. To address these challenges, we organized the Surgical Procedure Phase, Keypoint, and Instrument Recognition (PhaKIR) sub-challenge as part of the Endoscopic Vision (EndoVis) challenge at MICCAI 2024. We introduced a novel, multi-center dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three distinct medical institutions, with unified annotations for three interrelated tasks: surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation. Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data while supporting the integration of temporal information across entire procedures. We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges. The PhaKIR sub-challenge advances the field by providing a unique benchmark for developing temporally aware, context-driven methods in RAMIS and offers a high-quality resource to support future research in surgical scene understanding.

翻译：在计算机辅助与机器人辅助微创手术（RAMIS）中，对内窥镜视频记录中手术器械的可靠识别与定位是手术培训、技能评估及自主辅助等一系列应用的基础。然而，在实际临床条件下实现稳健性能仍面临重大挑战。整合手术上下文信息——例如当前手术阶段——已成为提升系统鲁棒性与可解释性的一种有前景的策略。为应对这些挑战，我们在MICCAI 2024会议的“内窥镜视觉”（EndoVis）挑战赛中组织了“手术阶段、关键点与器械识别”（PhaKIR）子挑战。我们引入了一个新颖的多中心数据集，包含从三家不同医疗机构收集的十三段完整腹腔镜胆囊切除术视频，并提供了针对三项相互关联任务的统一标注：手术阶段识别、器械关键点估计和器械实例分割。与现有数据集不同，本数据集支持在同一数据中联合研究器械定位与手术过程上下文，同时允许整合整个手术流程中的时序信息。我们依据生物医学图像分析挑战赛的BIAS指南报告了相关结果与发现。PhaKIR子挑战通过为开发具有时序感知和上下文驱动的RAMIS方法提供独特基准，推动了该领域的发展，并为未来手术场景理解研究提供了高质量资源。