Dissecting Self-Supervised Learning Methods for Surgical Computer Vision

Sanat Ramesh,Vinkle Srivastav,Deepak Alapatt,Tong Yu,Aditya Murali,Luca Sestini,Chinedu Innocent Nwoye,Idris Hamoud,Saurav Sharma,Antoine Fleurentin,Georgios Exarchakis,Alexandros Karargyris,Nicolas Padoy

The field of surgical computer vision has undergone considerable breakthroughs in recent years with the rising popularity of deep neural network-based methods. However, standard fully-supervised approaches for training such models require vast amounts of annotated data, imposing a prohibitively high cost; especially in the clinical domain. Self-Supervised Learning (SSL) methods, which have begun to gain traction in the general computer vision community, represent a potential solution to these annotation costs, allowing to learn useful representations from only unlabeled data. Still, the effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. In this work, we address this critical need by investigating four state-of-the-art SSL methods (MoCo v2, SimCLR, DINO, SwAV) in the context of surgical computer vision. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection. We examine their parameterization, then their behavior with respect to training data quantities in semi-supervised settings. Correct transfer of these methods to surgery, as described and conducted in this work, leads to substantial performance gains over generic uses of SSL - up to 7.4% on phase recognition and 20% on tool presence detection - as well as state-of-the-art semi-supervised phase recognition approaches by up to 14%. Further results obtained on a highly diverse selection of surgical datasets exhibit strong generalization properties. The code is available at https://github.com/CAMMA-public/SelfSupSurg.

翻译：手术计算机视觉领域近年来随着基于深度神经网络方法的日益普及取得了显著突破。然而，训练此类模型的标准全监督方法需要大量标注数据，带来了极高的成本，尤其是在临床领域。自监督学习（SSL）方法在通用计算机视觉领域开始受到关注，为解决这些标注成本问题提供了可能，能够仅从无标注数据中学习有用的表示。然而，SSL方法在医学和手术等更复杂且有影响力的领域中的有效性仍十分有限且尚未充分探索。在本研究中，我们针对这一关键需求展开探索，在手术计算机视觉背景下研究了四种最先进的SSL方法（MoCo v2、SimCLR、DINO、SwAV）。我们在Cholec80数据集上对这些方法在两个手术情境理解中基础且广泛的任务——阶段识别和工具存在检测——进行了全面的性能分析。我们考察了它们的参数设置，以及在半监督设置下针对训练数据量的行为。如本工作所述和执行的那样，将这些方法正确迁移至手术领域，相对于SSL的通用用法带来了显著的性能提升——阶段识别提升高达7.4%，工具存在检测提升高达20%——同时相较于最先进的半监督阶段识别方法也提升了高达14%。在高度多样化的手术数据集上获得的进一步结果展示了强大的泛化能力。代码可在https://github.com/CAMMA-public/SelfSupSurg获取。