Semi-supervised learning has demonstrated great potential in medical image segmentation by utilizing knowledge from unlabeled data. However, most existing approaches do not explicitly capture high-level semantic relations between distant regions, which limits their performance. In this paper, we focus on representation learning for semi-supervised learning, by developing a novel Multi-Scale Cross Supervised Contrastive Learning (MCSC) framework, to segment structures in medical images. We jointly train CNN and Transformer models, regularising their features to be semantically consistent across different scales. Our approach contrasts multi-scale features based on ground-truth and cross-predicted labels, in order to extract robust feature representations that reflect intra- and inter-slice relationships across the whole dataset. To tackle class imbalance, we take into account the prevalence of each class to guide contrastive learning and ensure that features adequately capture infrequent classes. Extensive experiments on two multi-structure medical segmentation datasets demonstrate the effectiveness of MCSC. It not only outperforms state-of-the-art semi-supervised methods by more than 3.0% in Dice, but also greatly reduces the performance gap with fully supervised methods.
翻译:半监督学习通过利用无标注数据的知识,在医学图像分割领域展现出巨大潜力。然而,现有方法大多未能明确捕捉远距离区域间的高层语义关联,这限制了其性能。本文聚焦于半监督学习中的表征学习,提出了一种新颖的多尺度交叉监督对比学习(MCSC)框架,用于分割医学图像中的结构。我们联合训练CNN和Transformer模型,并规范其特征在不同尺度上保持语义一致性。我们的方法基于真实标签和交叉预测标签对多尺度特征进行对比,从而提取能够反映整个数据集中切片内和切片间关系的鲁棒特征表示。为应对类别不平衡问题,我们考虑每个类别的出现频率来引导对比学习,确保特征充分捕获低频类别。在两个多结构医学分割数据集上的大量实验证明了MCSC的有效性。它不仅在Dice系数上比最先进的半监督方法提升超过3.0%,还显著缩小了与全监督方法的性能差距。