Hybrid Representation-Enhanced Sampling for Bayesian Active Learning in Musculoskeletal Segmentation of Lower Extremities

Purpose: Obtaining manual annotations to train deep learning (DL) models for auto-segmentation is often time-consuming. Uncertainty-based Bayesian active learning (BAL) is a widely-adopted method to reduce annotation efforts. Based on BAL, this study introduces a hybrid representation-enhanced sampling strategy that integrates density and diversity criteria to save manual annotation costs by efficiently selecting the most informative samples. Methods: The experiments are performed on two lower extremity (LE) datasets of MRI and CT images by a BAL framework based on Bayesian U-net. Our method selects uncertain samples with high density and diversity for manual revision, optimizing for maximal similarity to unlabeled instances and minimal similarity to existing training data. We assess the accuracy and efficiency using Dice and a proposed metric called reduced annotation cost (RAC), respectively. We further evaluate the impact of various acquisition rules on BAL performance and design an ablation study for effectiveness estimation. Results: The proposed method showed superiority or non-inferiority to other methods on both datasets across two acquisition rules, and quantitative results reveal the pros and cons of the acquisition rules. Our ablation study in volume-wise acquisition shows that the combination of density and diversity criteria outperforms solely using either of them in musculoskeletal segmentation. Conclusion: Our sampling method is proven efficient in reducing annotation costs in image segmentation tasks. The combination of the proposed method and our BAL framework provides a semi-automatic way for efficient annotation of medical image datasets.

翻译：目的：获取手动标注以训练用于自动分割的深度学习模型通常耗时。基于不确定性的贝叶斯主动学习是一种广泛采用以减少标注工作量的方法。本研究基于贝叶斯主动学习，提出一种混合表示增强采样策略，该策略整合密度与多样性准则，通过高效选择最具信息量的样本来节省手动标注成本。方法：采用基于贝叶斯U-net的贝叶斯主动学习框架，在两组下肢MRI和CT图像数据集上进行实验。本方法选择具有高密度与高多样性的不确定样本进行手动修正，优化目标为实现与未标注实例的最大相似度以及与现有训练数据的最小相似度。分别使用Dice系数和提出的缩减标注成本指标评估准确性与效率，并进一步分析不同获取规则对贝叶斯主动学习性能的影响，设计消融实验以验证有效性。结果：在两种获取规则下，所提方法在两组数据集上均表现优于或非劣于其他方法，定量结果揭示了获取规则的优缺点。基于体积获取的消融研究发现，密度与多样性准则的组合在肌肉骨骼分割中优于单独使用任一准则。结论：本采样方法被证明能有效降低图像分割任务中的标注成本。将所提方法与贝叶斯主动学习框架相结合，为医学图像数据集的高效标注提供了一种半自动化途径。