A common problem with segmentation of medical images using neural networks is the difficulty to obtain a significant number of pixel-level annotated data for training. To address this issue, we proposed a semi-supervised segmentation network based on contrastive learning. In contrast to the previous state-of-the-art, we introduce Min-Max Similarity (MMS), a contrastive learning form of dual-view training by employing classifiers and projectors to build all-negative, and positive and negative feature pairs, respectively, to formulate the learning as solving a MMS problem. The all-negative pairs are used to supervise the networks learning from different views and to capture general features, and the consistency of unlabeled predictions is measured by pixel-wise contrastive loss between positive and negative pairs. To quantitatively and qualitatively evaluate our proposed method, we test it on four public endoscopy surgical tool segmentation datasets and one cochlear implant surgery dataset, which we manually annotated. Results indicate that our proposed method consistently outperforms state-of-the-art semi-supervised and fully supervised segmentation algorithms. And our semi-supervised segmentation algorithm can successfully recognize unknown surgical tools and provide good predictions. Also, our MMS approach could achieve inference speeds of about 40 frames per second (fps) and is suitable to deal with the real-time video segmentation.
翻译:利用神经网络进行医学图像分割时,一个常见问题是难以获得大量像素级标注数据用于训练。为解决这一问题,我们提出了一种基于对比学习的半监督分割网络。与现有最优方法不同,我们引入了最小-最大相似度(MMS)——一种通过分类器和投影器分别构建全负对以及正负特征对的双视角训练对比学习形式,从而将学习过程转化为求解MMS问题。全负对用于监督网络从不同视角学习并捕获通用特征,而未标注预测的一致性则通过正负对之间的像素级对比损失进行度量。为定量和定性评估所提方法,我们在四个公开内窥镜手术器械分割数据集以及一个我们手动标注的人工耳蜗植入手术数据集上进行了测试。结果表明,所提方法始终优于最先进的半监督和全监督分割算法。我们的半监督分割算法能够成功识别未知手术器械并提供良好预测。此外,我们的MMS方法可实现约40帧/秒(fps)的推理速度,适用于实时视频分割任务。