Quality of deep convolutional neural network predictions strongly depends on the size of the training dataset and the quality of the annotations. Creating annotations, especially for 3D medical image segmentation, is time-consuming and requires expert knowledge. We propose a novel semi-supervised learning (SSL) approach that requires only a relatively small number of annotations while being able to use the remaining unlabeled data to improve model performance. Our method uses a pseudo-labeling technique that employs recent deep learning uncertainty estimation models. By using the estimated uncertainty, we were able to rank pseudo-labels and automatically select the best pseudo-annotations generated by the supervised model. We applied this to prostate zonal segmentation in T2-weighted MRI scans. Our proposed model outperformed the semi-supervised model in experiments with the ProstateX dataset and an external test set, by leveraging only a subset of unlabeled data rather than the full collection of 4953 cases, our proposed model demonstrated improved performance. The segmentation dice similarity coefficient in the transition zone and peripheral zone increased from 0.835 and 0.727 to 0.852 and 0.751, respectively, for fully supervised model and the uncertainty-aware semi-supervised learning model (USSL). Our USSL model demonstrates the potential to allow deep learning models to be trained on large datasets without requiring full annotation. Our code is available at https://github.com/DIAGNijmegen/prostateMR-USSL.
翻译:深度卷积神经网络预测质量强烈依赖于训练数据集规模和标注质量。创建标注(尤其是针对3D医学图像分割)既耗时又需要专家知识。我们提出了一种新型半监督学习(SSL)方法,该方法仅需相对较少的标注数量,同时能够利用剩余未标注数据提升模型性能。我们的方法采用伪标签技术,并融合了最新的深度学习不确定性估计模型。通过使用估计的不确定性,我们能够对伪标签进行排序,并自动选择由监督模型生成的最佳伪标注。我们将该方法应用于T2加权MRI扫描中的前列腺分区分割任务。在ProstateX数据集和外部测试集上的实验中,我们的模型仅利用部分未标注数据(而非全部4953例病例)便超越了半监督模型,展现出性能提升。完全监督模型与面向不确定性的半监督学习模型(USSL)在过渡区和外周区的分割Dice相似系数分别从0.835和0.727提升至0.852和0.751。我们的USSL模型展示了无需完全标注即可在大型数据集上训练深度学习模型的潜力。代码已开源在https://github.com/DIAGNijmegen/prostateMR-USSL。