MedCAL-Bench: A Comprehensive Benchmark on Cold-Start Active Learning with Foundation Models for Medical Image Analysis

Cold-Start Active Learning (CSAL) aims to select informative samples for annotation without prior knowledge, which is important for improving annotation efficiency and model performance under a limited annotation budget in medical image analysis. Most existing CSAL methods rely on Self-Supervised Learning (SSL) on the target dataset for feature extraction, which is inefficient and limited by insufficient feature representation. Recently, pre-trained Foundation Models (FMs) have shown powerful feature extraction ability with a potential for better CSAL. However, this paradigm has been rarely investigated, with a lack of benchmarks for comparison of FMs in CSAL tasks. To this end, we propose MedCAL-Bench, the first systematic FM-based CSAL benchmark for medical image analysis. We evaluate 14 FMs and 7 CSAL strategies across 7 datasets under different annotation budgets, covering classification and segmentation tasks from diverse medical modalities. It is also the first CSAL benchmark that evaluates both the feature extraction and sample selection stages. Our experimental results reveal that: 1) Most FMs are effective feature extractors for CSAL, with DINO family performing the best in segmentation; 2) The performance differences of these FMs are large in segmentation tasks, while small for classification; 3) Different sample selection strategies should be considered in CSAL on different datasets, with Active Learning by Processing Surprisal (ALPS) performing the best in segmentation while RepDiv leading for classification. The code is available at https://github.com/HiLab-git/MedCAL-Bench.

翻译：冷启动主动学习（CSAL）旨在无需先验知识的情况下选择信息丰富的样本进行标注，这对于在医学图像分析中有限的标注预算下提升标注效率与模型性能至关重要。现有的大多数CSAL方法依赖于在目标数据集上进行自监督学习（SSL）以提取特征，这种方法效率较低且受限于特征表示不足。近年来，预训练基础模型（FMs）展现出强大的特征提取能力，有望实现更好的CSAL效果。然而，这一范式尚未得到充分研究，且缺乏用于比较FMs在CSAL任务中性能的基准。为此，我们提出了MedCAL-Bench，这是首个面向医学图像分析的系统性基于FM的CSAL基准。我们在7个数据集上评估了14种FMs和7种CSAL策略，涵盖不同标注预算下的分类与分割任务，涉及多种医学影像模态。这也是首个同时评估特征提取与样本选择两个阶段的CSAL基准。我们的实验结果表明：1）大多数FMs可作为CSAL的有效特征提取器，其中DINO系列在分割任务中表现最佳；2）这些FMs在分割任务中的性能差异较大，而在分类任务中差异较小；3）在不同数据集上进行CSAL时应考虑不同的样本选择策略，其中Active Learning by Processing Surprisal（ALPS）在分割任务中表现最优，而RepDiv在分类任务中领先。代码发布于 https://github.com/HiLab-git/MedCAL-Bench。