The process of annotating histological gigapixel-sized whole slide images (WSIs) at the pixel level for the purpose of training a supervised segmentation model is time-consuming. Region-based active learning (AL) involves training the model on a limited number of annotated image regions instead of requesting annotations of the entire images. These annotation regions are iteratively selected, with the goal of optimizing model performance while minimizing the annotated area. The standard method for region selection evaluates the informativeness of all square regions of a specified size and then selects a specific quantity of the most informative regions. We find that the efficiency of this method highly depends on the choice of AL step size (i.e., the combination of region size and the number of selected regions per WSI), and a suboptimal AL step size can result in redundant annotation requests or inflated computation costs. This paper introduces a novel technique for selecting annotation regions adaptively, mitigating the reliance on this AL hyperparameter. Specifically, we dynamically determine each region by first identifying an informative area and then detecting its optimal bounding box, as opposed to selecting regions of a uniform predefined shape and size as in the standard method. We evaluate our method using the task of breast cancer metastases segmentation on the public CAMELYON16 dataset and show that it consistently achieves higher sampling efficiency than the standard method across various AL step sizes. With only 2.6\% of tissue area annotated, we achieve full annotation performance and thereby substantially reduce the costs of annotating a WSI dataset. The source code is available at https://github.com/DeepMicroscopy/AdaptiveRegionSelection.
翻译:在像素级别标注组织学千兆像素全切片图像以训练监督分割模型的过程非常耗时。基于区域的主动学习方法仅需在有限数量的标注图像区域上训练模型,而非请求整幅图像的标注。这些标注区域被迭代选择,其目标是在最小化标注面积的同时优化模型性能。标准的区域选择方法会评估所有指定尺寸正方形区域的信息量,然后选取特定数量的最具信息量的区域。我们发现,该方法的效率高度依赖于主动学习步长(即区域尺寸与每张全切片图像所选区域数量的组合)的选择,次优的主动学习步长可能导致冗余标注请求或计算成本激增。本文提出一种自适应选择标注区域的新技术,以减轻对主动学习超参数的依赖。具体而言,我们通过首先识别信息量丰富的区域,然后检测其最优边界框来动态确定每个区域,而非像标准方法那样选择统一预定义形状和大小的区域。我们利用公开的CAMELYON16数据集上的乳腺癌转移分割任务评估了该方法,结果显示,在不同主动学习步长下,该方法均能持续实现比标准方法更高的采样效率。仅标注2.6%的组织区域,我们就能达到完全标注的性能,从而显著降低标注全切片图像数据集的成本。源代码可在https://github.com/DeepMicroscopy/AdaptiveRegionSelection获取。