Dissecting Self-Supervised Learning Methods for Surgical Computer Vision

Sanat Ramesh,Vinkle Srivastav,Deepak Alapatt,Tong Yu,Aditya Murali,Luca Sestini,Chinedu Innocent Nwoye,Idris Hamoud,Saurav Sharma,Antoine Fleurentin,Georgios Exarchakis,Alexandros Karargyris,Nicolas Padoy

The field of surgical computer vision has undergone considerable breakthroughs in recent years with the rising popularity of deep neural network-based methods. However, standard fully-supervised approaches for training such models require vast amounts of annotated data, imposing a prohibitively high cost; especially in the clinical domain. Self-Supervised Learning (SSL) methods, which have begun to gain traction in the general computer vision community, represent a potential solution to these annotation costs, allowing to learn useful representations from only unlabeled data. Still, the effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. In this work, we address this critical need by investigating four state-of-the-art SSL methods (MoCo v2, SimCLR, DINO, SwAV) in the context of surgical computer vision. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection. We examine their parameterization, then their behavior with respect to training data quantities in semi-supervised settings. Correct transfer of these methods to surgery, as described and conducted in this work, leads to substantial performance gains over generic uses of SSL - up to 7.4% on phase recognition and 20% on tool presence detection - as well as state-of-the-art semi-supervised phase recognition approaches by up to 14%. Further results obtained on a highly diverse selection of surgical datasets exhibit strong generalization properties. The code will be made available at https://github.com/CAMMA-public/SelfSupSurg.

翻译：近年来，随着基于深度神经网络的方法日益流行，手术计算机视觉领域取得了重大突破。然而，训练此类模型的标准全监督方法需要大量标注数据，导致成本过高，尤其在临床领域。自监督学习方法（SSL）在通用计算机视觉领域开始获得广泛关注，它仅通过无标注数据即可学习有用的表征，为降低标注成本提供了潜在解决方案。尽管如此，SSL方法在医学和手术等更复杂且具有深远影响的领域中，其有效性仍然有限且尚未得到充分探索。针对这一关键需求，本文研究了四种最先进的SSL方法（MoCo v2、SimCLR、DINO、SwAV）在手术计算机视觉背景下的应用。我们在Cholec80数据集上对手术情境理解中的两个基础且广泛使用的任务——阶段识别和工具存在检测，进行了这些方法性能的全面分析。我们考察了它们的参数化设置，以及在半监督设置下相对于训练数据量的行为表现。通过本文描述并实施的方法，将这些SSL方法正确迁移到手术领域后，相较于SSL的通用使用方式，在阶段识别和工具存在检测上分别取得了高达7.4%和20%的性能提升，并且相比最先进的半监督阶段识别方法也提升了高达14%。此外，在高度多样化的手术数据集上获得的结果展现了强大的泛化能力。相关代码将发布在https://github.com/CAMMA-public/SelfSupSurg。