Deep learning models have demonstrated remarkable success in multi-organ segmentation but typically require large-scale datasets with all organs of interest annotated. However, medical image datasets are often low in sample size and only partially labeled, i.e., only a subset of organs are annotated. Therefore, it is crucial to investigate how to learn a unified model on the available partially labeled datasets to leverage their synergistic potential. In this paper, we empirically and systematically study the partial-label segmentation with in-depth analyses on the existing approaches and identify three distinct types of supervision signals, including two signals derived from ground truth and one from pseudo label. We propose a novel training framework termed COSST, which effectively and efficiently integrates comprehensive supervision signals with self-training. Concretely, we first train an initial unified model using two ground truth-based signals and then iteratively incorporate the pseudo label signal to the initial model using self-training. To mitigate performance degradation caused by unreliable pseudo labels, we assess the reliability of pseudo labels via outlier detection in latent space and exclude the most unreliable pseudo labels from each self-training iteration. Extensive experiments are conducted on six CT datasets for three partial-label segmentation tasks. Experimental results show that our proposed COSST achieves significant improvement over the baseline method, i.e., individual networks trained on each partially labeled dataset. Compared to the state-of-the-art partial-label segmentation methods, COSST demonstrates consistent superior performance on various segmentation tasks and with different training data size.
翻译:深度学习模型在多器官分割中展现出显著成功,但通常需要包含所有感兴趣器官标注的大规模数据集。然而,医学影像数据集往往样本量较小且仅部分标注,即仅标注部分器官。因此,如何利用现有的部分标注数据集学习统一模型以发挥其协同潜力至关重要。本文对部分标注分割问题进行了系统的实验研究,深入分析了现有方法,并识别出三种不同类型的监督信号,包括两种源自真实标注的信号和一种来自伪标签的信号。我们提出了一种名为COSST的新型训练框架,该框架能够有效且高效地将综合监督信号与自训练相结合。具体而言,我们首先利用两种基于真实标注的信号训练初始统一模型,然后通过自训练将伪标签信号迭代融入初始模型。为减轻不可靠伪标签导致的性能下降,我们通过隐空间中的异常检测评估伪标签的可靠性,并在每次自训练迭代中剔除最不可靠的伪标签。我们在六个CT数据集上针对三种部分标注分割任务开展了大量实验。实验结果表明,我们提出的COSST相较于基线方法(即在每个部分标注数据集上独立训练的模型)取得了显著性能提升。与最先进的部分标注分割方法相比,COSST在不同分割任务及不同训练数据规模下均展现出持续优越的性能。