Self-supervised learning (SSL) has led to important breakthroughs in computer vision by allowing learning from large amounts of unlabeled data. As such, it might have a pivotal role to play in biomedicine where annotating data requires a highly specialized expertise. Yet, there are many healthcare domains for which SSL has not been extensively explored. One such domain is endoscopy, minimally invasive procedures which are commonly used to detect and treat infections, chronic inflammatory diseases or cancer. In this work, we study the use of a leading SSL framework, namely Masked Siamese Networks (MSNs), for endoscopic video analysis such as colonoscopy and laparoscopy. To fully exploit the power of SSL, we create sizable unlabeled endoscopic video datasets for training MSNs. These strong image representations serve as a foundation for secondary training with limited annotated datasets, resulting in state-of-the-art performance in endoscopic benchmarks like surgical phase recognition during laparoscopy and colonoscopic polyp characterization. Additionally, we achieve a 50% reduction in annotated data size without sacrificing performance. Thus, our work provides evidence that SSL can dramatically reduce the need of annotated data in endoscopy.
翻译:自监督学习(SSL)通过利用大量无标注数据进行学习,已在计算机视觉领域取得重要突破。因此,在标注数据需要高度专业知识的生物医学领域,它可能发挥关键作用。然而,许多医疗领域尚未对自监督学习进行深入探索。其中内窥镜检查便是典型代表——这类微创手术常用于检测和治疗感染、慢性炎症疾病或癌症。本研究探索了领先的自监督学习框架——掩码孪生网络(MSNs)——在结肠镜检查和腹腔镜检查等内窥镜视频分析中的应用。为充分挖掘自监督学习的潜力,我们构建了大规模无标注内窥镜视频数据集用于训练MSNs。这些强大的图像表征可作为基础,通过有限标注数据集进行二次训练,最终在腹腔镜手术阶段识别、结肠镜息肉表征等内窥镜基准测试中实现最先进性能。此外,我们在不牺牲性能的前提下将标注数据量减少50%。因此,本研究证明了自监督学习可显著降低内窥镜领域对标注数据的需求。