A versatile medical image segmentation model applicable to imaging data collected with diverse equipment and protocols can facilitate model deployment and maintenance. However, building such a model typically requires a large, diverse, and fully annotated dataset, which is rarely available due to the labor-intensive and costly data curation. In this study, we develop a cost-efficient method by harnessing readily available data with partially or even sparsely annotated segmentation labels. We devise strategies for model self-disambiguation, prior knowledge incorporation, and imbalance mitigation to address challenges associated with inconsistently labeled data from various sources, including label ambiguity and imbalances across modalities, datasets, and segmentation labels. Experimental results on a multi-modal dataset compiled from eight different sources for abdominal organ segmentation have demonstrated our method's effectiveness and superior performance over alternative state-of-the-art methods, highlighting its potential for optimizing the use of existing annotated data and reducing the annotation efforts for new data to further enhance model capability.
翻译:面向多样设备与协议采集的医学影像数据,通用型医学图像分割模型可简化模型部署与维护。然而,构建此类模型通常需要大规模、多样化且完全标注的数据集,鉴于数据标注过程劳动密集且成本高昂,此类数据集往往难以获取。本研究提出一种经济高效的方法,通过利用部分标注甚至稀疏标注的现成数据实现分割任务。我们设计了模型自消歧、先验知识融合及不平衡缓解策略,以应对多源数据标注不一致所带来的挑战,包括标签歧义性以及模态、数据集和分割标签间的类别不平衡问题。在汇集八个不同来源的腹部器官分割多模态数据集上的实验结果表明,本方法具有高效性且在性能上超越现有最优方法,充分展现了其在优化现有标注数据利用效率、降低新数据标注成本以进一步增强模型能力方面的潜力。