Endoscopic video recordings are widely used in minimally invasive robot-assisted surgery, but when the endoscope is outside the patient's body, it can capture irrelevant segments that may contain sensitive information. To address this, we propose a framework that accurately detects out-of-body frames in surgical videos by leveraging self-supervision with minimal data labels. We use a massive amount of unlabeled endoscopic images to learn meaningful representations in a self-supervised manner. Our approach, which involves pre-training on an auxiliary task and fine-tuning with limited supervision, outperforms previous methods for detecting out-of-body frames in surgical videos captured from da Vinci X and Xi surgical systems. The average F1 scores range from 96.00 to 98.02. Remarkably, using only 5% of the training labels, our approach still maintains an average F1 score performance above 97, outperforming fully-supervised methods with 95% fewer labels. These results demonstrate the potential of our framework to facilitate the safe handling of surgical video recordings and enhance data privacy protection in minimally invasive surgery.
翻译:内窥镜视频记录被广泛应用于微创机器人辅助手术中,但当内窥镜位于患者体外时,可能捕获包含敏感信息的无关片段。为解决这一问题,我们提出了一种框架,通过利用自监督学习与极少量数据标注,精准检测手术视频中的体外帧。我们使用大量未标注的内窥镜图像,以自监督方式学习有意义的表征。我们的方法涉及在辅助任务上进行预训练,并通过有限监督进行微调,在检测da Vinci X和Xi手术系统录制的手术视频中的体外帧方面,其性能优于先前方法。平均F1分数在96.00至98.02之间。值得注意的是,仅使用5%的训练标注,我们的方法仍能保持97以上的平均F1分数性能,且使用的标注量比全监督方法减少95%。这些结果表明,我们的框架具有促进手术视频记录安全处理的潜力,并能增强微创手术中的数据隐私保护。