Purpose: Manual annotations for training deep learning (DL) models in auto-segmentation are time-intensive. This study introduces a hybrid representation-enhanced sampling strategy that integrates both density and diversity criteria within an uncertainty-based Bayesian active learning (BAL) framework to reduce annotation efforts by selecting the most informative training samples. Methods: The experiments are performed on two lower extremity (LE) datasets of MRI and CT images, focusing on the segmentation of the femur, pelvis, sacrum, quadriceps femoris, hamstrings, adductors, sartorius, and iliopsoas, utilizing a U-net-based BAL framework. Our method selects uncertain samples with high density and diversity for manual revision, optimizing for maximal similarity to unlabeled instances and minimal similarity to existing training data. We assess the accuracy and efficiency using Dice and a proposed metric called reduced annotation cost (RAC), respectively. We further evaluate the impact of various acquisition rules on BAL performance and design an ablation study for effectiveness estimation. Results: In MRI and CT datasets, our method was superior or comparable to existing ones, achieving a 0.8\% Dice and 1.0\% RAC increase in CT (statistically significant), and a 0.8\% Dice and 1.1\% RAC increase in MRI (not statistically significant) in volume-wise acquisition. Our ablation study indicates that combining density and diversity criteria enhances the efficiency of BAL in musculoskeletal segmentation compared to using either criterion alone. Conclusion: Our sampling method is proven efficient in reducing annotation costs in image segmentation tasks. The combination of the proposed method and our BAL framework provides a semi-automatic way for efficient annotation of medical image datasets.
翻译:目的:手动标注用于训练深度学习(DL)自动分割模型的医学图像数据耗时巨大。本研究提出一种混合表示增强采样策略,在基于不确定性的贝叶斯主动学习(BAL)框架中融合密度与多样性准则,通过选择最具信息量的训练样本来减少标注工作量。方法:实验在两个下肢(LE)数据集(MRI与CT图像)上进行,聚焦于股骨、骨盆、骶骨、股四头肌、腘绳肌、内收肌群、缝匠肌和髂腰肌的分割任务,采用基于U-net的BAL框架。本方法选择具有高密度与高多样性的不确定样本进行人工修正,通过最大化与未标注样本的相似性及最小化与现有训练数据的相似性实现优化。使用Dice系数和提出的缩减标注成本(RAC)指标分别评估准确性与效率,进一步分析不同采集规则对BAL性能的影响,并通过消融实验验证有效性。结果:在MRI与CT数据集中,本方法优于或可比于现有方法:在基于逐体素采集策略下,CT数据集上Dice系数提升0.8%、RAC提升1.0%(统计显著),MRI数据集上Dice系数提升0.8%、RAC提升1.1%(统计不显著)。消融实验表明,相较于单独使用密度或多样性准则,两者结合可提升BAL在肌肉骨骼分割中的效率。结论:本采样方法在图像分割任务中可有效降低标注成本。将所提方法与BAL框架结合,为医学图像数据集的高效标注提供了半自动化途径。