Whole slide image (WSI) classification requires repetitive zoom-in and out for pathologists, as only small portions of the slide may be relevant to detecting cancer. Due to the lack of patch-level labels, multiple instance learning (MIL) is a common practice for training a WSI classifier. One of the challenges in MIL for WSIs is the weak supervision coming only from the slide-level labels, often resulting in severe overfitting. In response, researchers have considered adopting patch-level augmentation or applying mixup augmentation, but their applicability remains unverified. Our approach augments the training dataset by sampling a subset of patches in the WSI without significantly altering the underlying semantics of the original slides. Additionally, we introduce an efficient model (Slot-MIL) that organizes patches into a fixed number of slots, the abstract representation of patches, using an attention mechanism. We empirically demonstrate that the subsampling augmentation helps to make more informative slots by restricting the over-concentration of attention and to improve interpretability. Finally, we illustrate that combining our attention-based aggregation model with subsampling and mixup, which has shown limited compatibility in existing MIL methods, can enhance both generalization and calibration. Our proposed methods achieve the state-of-the-art performance across various benchmark datasets including class imbalance and distribution shifts.
翻译:全切片图像分类需要病理学家反复进行放大和缩小操作,因为载玻片中只有小部分区域可能与癌症检测相关。由于缺乏斑块级标签,多实例学习是全切片图像分类器训练的常用方法。全切片图像多实例学习面临的挑战之一是仅来自切片级标签的弱监督,这往往会导致严重的过拟合。为此,研究者们尝试采用斑块级增强或混合增强方法,但其适用性尚未得到验证。我们的方法通过在全切片图像中对斑块子集进行采样来扩充训练数据集,同时不会显著改变原始切片的底层语义。此外,我们引入了一个高效模型(Slot-MIL),该模型利用注意力机制将斑块组织成固定数量的槽位(即斑块的抽象表示)。实验表明,子采样增强通过限制注意力的过度集中,有助于生成更具信息量的槽位,并提升可解释性。最后,我们证明将基于注意力机制的聚合模型与子采样和混合(现有多实例方法中原有兼容性有限)相结合,可以同时改善泛化能力和校准性能。所提方法在包含类别不平衡和分布偏移的多个基准数据集上均达到了当前最优性能。