Learning representations for individual instances when only bag-level labels are available is a fundamental challenge in multiple instance learning (MIL). Recent works have shown promising results using contrastive self-supervised learning (CSSL), which learns to push apart representations corresponding to two different randomly-selected instances. Unfortunately, in real-world applications such as medical image classification, there is often class imbalance, so randomly-selected instances mostly belong to the same majority class, which precludes CSSL from learning inter-class differences. To address this issue, we propose a novel framework, Iterative Self-paced Supervised Contrastive Learning for MIL Representations (ItS2CLR), which improves the learned representation by exploiting instance-level pseudo labels derived from the bag-level labels. The framework employs a novel self-paced sampling strategy to ensure the accuracy of pseudo labels. We evaluate ItS2CLR on three medical datasets, showing that it improves the quality of instance-level pseudo labels and representations, and outperforms existing MIL methods in terms of both bag and instance level accuracy. Code is available at https://github.com/Kangningthu/ItS2CLR
翻译:在仅拥有包级标签的情况下学习单个实例的表征是多实例学习(MIL)中的一个基本挑战。近期研究表明,对比自监督学习(CSSL)通过迫使两个随机选取的不同实例的表征相互远离,取得了令人瞩目的成果。然而,在医学图像分类等实际应用中,常存在类别不平衡问题,导致随机选取的实例大多属于同一多数类,从而使CSSL无法有效学习类间差异。为解决这一问题,我们提出一种新颖框架——面向MIL表征的迭代自步进监督对比学习(ItS2CLR),该框架通过利用从包级标签推导出的实例级伪标签来改进所学表征。框架采用新颖的自步进采样策略以确保伪标签的准确性。我们在三个医学数据集上评估了ItS2CLR,结果表明该方法不仅提升了实例级伪标签的质量与表征效果,还在包级与实例级准确率上均优于现有MIL方法。代码已开源:https://github.com/Kangningthu/ItS2CLR