The whole slide image (WSI) classification is often formulated as a multiple instance learning (MIL) problem. Since the positive tissue is only a small fraction of the gigapixel WSI, existing MIL methods intuitively focus on identifying salient instances via attention mechanisms. However, this leads to a bias towards easy-to-classify instances while neglecting hard-to-classify instances. Some literature has revealed that hard examples are beneficial for modeling a discriminative boundary accurately. By applying such an idea at the instance level, we elaborate a novel MIL framework with masked hard instance mining (MHIM-MIL), which uses a Siamese structure (Teacher-Student) with a consistency constraint to explore the potential hard instances. With several instance masking strategies based on attention scores, MHIM-MIL employs a momentum teacher to implicitly mine hard instances for training the student model, which can be any attention-based MIL model. This counter-intuitive strategy essentially enables the student to learn a better discriminating boundary. Moreover, the student is used to update the teacher with an exponential moving average (EMA), which in turn identifies new hard instances for subsequent training iterations and stabilizes the optimization. Experimental results on the CAMELYON-16 and TCGA Lung Cancer datasets demonstrate that MHIM-MIL outperforms other latest methods in terms of performance and training cost. The code is available at: https://github.com/DearCaat/MHIM-MIL.
翻译:全切片图像分类通常被形式化为多实例学习问题。由于阳性组织仅占千兆像素级全切片图像的一小部分,现有多实例学习方法直观地通过注意力机制聚焦于显著实例的识别。但这会导致模型偏向易分类实例,而忽视难分类实例。已有文献揭示,难例有助于精确建模判别边界。通过将这一思想应用于实例层面,我们精心设计了一种新颖的基于掩码难例挖掘的多实例学习框架,该框架采用带有一致性约束的孪生结构(教师-学生模型)来探索潜在难例。基于注意力分数的多种实例掩码策略下,所提方法利用动量教师隐式挖掘难例以训练学生模型(该学生模型可为任意基于注意力的多实例学习模型)。这种反直觉策略本质上使学生模型能够学习更优的判别边界。此外,学生模型通过指数移动平均更新教师模型,进而为后续训练迭代识别新难例并稳定优化过程。在CAMELYON-16和TCGA肺癌数据集上的实验结果表明,所提方法在性能和训练成本方面均优于其他最新方法。代码地址:https://github.com/DearCaat/MHIM-MIL。