CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection

Chinedu Innocent Nwoye,Tong Yu,Saurav Sharma,Aditya Murali,Deepak Alapatt,Armine Vardazaryan,Kun Yuan,Jonas Hajek,Wolfgang Reiter,Amine Yamlahi,Finn-Henri Smidt,Xiaoyang Zou,Guoyan Zheng,Bruno Oliveira,Helena R. Torres,Satoshi Kondo,Satoshi Kasai,Felix Holm,Ege Özsoy,Shuangchun Gui,Han Li,Sista Raviteja,Rachana Sathish,Pranav Poudel,Binod Bhattarai,Ziheng Wang,Guo Rui,Melanie Schellenberg,João L. Vilaça,Tobias Czempiel,Zhenkun Wang,Debdoot Sheet,Shrawan Kumar Thapa,Max Berniker,Patrick Godau,Pedro Morais,Sudarshan Regmi,Thuy Nuong Tran,Jaime Fonseca,Jan-Hinrich Nölke,Estevão Lima,Eduard Vazquez,Lena Maier-Hein,Nassir Navab,Pietro Mascagni,Barbara Seeliger,Cristians Gonzalez,Didier Mutter,Nicolas Padoy

from arxiv, MICCAI EndoVis CholecTriplet2022 challenge report. Submitted to journal of Medical Image Analysis. 22 pages, 14 figures, 6 tables

Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of <instrument, verb, target> triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results, their significance, and useful insights for future research directions and applications in surgery.

翻译：将手术活动形式化为使用器械、执行动作与目标解剖结构的三元组，正成为手术活动建模的金标准方法。其优势在于该形式化有助于更细致地理解工具-组织交互，从而为图像引导手术开发更优的人工智能辅助系统。早期研究及2021年发起的CholecTriplet挑战已提出从手术视频中识别这些三元组的相关技术。进一步估计三元组的空间位置将能为计算机辅助干预提供更精准的术中上下文感知决策支持。本文介绍了CholecTriplet2022挑战，该挑战将手术动作三元组建模从识别扩展至检测，包括对每个可见手术器械（工具）的弱监督边界框定位（作为关键执行者），并以<器械，动词，目标>三元组形式对每个工具-活动进行建模。本文描述了挑战赛中解决该任务的基础方法及10种新型深度学习算法，提供了各类方法的详尽方法学对比、结果深度分析及其显著性评估，并为未来手术研究方向与应用提供了重要启示。