Surgical phase recognition is a basic component for different context-aware applications in computer- and robot-assisted surgery. In recent years, several methods for automatic surgical phase recognition have been proposed, showing promising results. However, a meaningful comparison of these methods is difficult due to differences in the evaluation process and incomplete reporting of evaluation details. In particular, the details of metric computation can vary widely between different studies. To raise awareness of potential inconsistencies, this paper summarizes common deviations in the evaluation of phase recognition algorithms on the Cholec80 benchmark. In addition, a structured overview of previously reported evaluation results on Cholec80 is provided, taking known differences in evaluation protocols into account. Greater attention to evaluation details could help achieve more consistent and comparable results on the surgical phase recognition task, leading to more reliable conclusions about advancements in the field and, finally, translation into clinical practice.
翻译:手术阶段识别是计算机辅助手术和机器人辅助手术中各类上下文感知应用的基础组成部分。近年来,多种自动手术阶段识别方法被提出并展现了令人鼓舞的结果。然而,由于评估过程的差异以及评估细节报告的不完整,这些方法之间难以进行有意义的比较。特别是,不同研究中指标计算的具体细节可能存在显著差异。为提高对潜在不一致性的认识,本文总结了基于Cholec80基准的阶段识别算法评估中常见的偏差。此外,本文还提供了Cholec80数据集上既往评估结果的系统性概述,并考虑了评估协议中的已知差异。加强对评估细节的关注有助于在手术阶段识别任务中获得更一致、更具可比性的结果,从而对该领域的技术进展得出更可靠的结论,并最终推动其向临床实践的转化。