Whole slide image (WSI) classification is an essential task in computational pathology. Despite the recent advances in multiple instance learning (MIL) for WSI classification, accurate classification of WSIs remains challenging due to the extreme imbalance between the positive and negative instances in bags, and the complicated pre-processing to fuse multi-scale information of WSI. To this end, we propose a novel multi-scale prototypical Transformer (MSPT) for WSI classification, which includes a prototypical Transformer (PT) module and a multi-scale feature fusion module (MFFM). The PT is developed to reduce redundant instances in bags by integrating prototypical learning into the Transformer architecture. It substitutes all instances with cluster prototypes, which are then re-calibrated through the self-attention mechanism of the Trans-former. Thereafter, an MFFM is proposed to fuse the clustered prototypes of different scales, which employs MLP-Mixer to enhance the information communication between prototypes. The experimental results on two public WSI datasets demonstrate that the proposed MSPT outperforms all the compared algorithms, suggesting its potential applications.
翻译:全切片图像(WSI)分类是计算病理学中的一项关键任务。尽管近年来多实例学习(MIL)在WSI分类中取得了进展,但由于包中正负实例的极端不平衡以及融合WSI多尺度信息的复杂预处理,WSI的精确分类仍面临挑战。为此,我们提出了一种新颖的多尺度原型Transformer(MSPT)用于WSI分类,该模型包括一个原型Transformer(PT)模块和一个多尺度特征融合模块(MFFM)。PT通过将原型学习集成到Transformer架构中,减少包中的冗余实例。它用聚类原型替换所有实例,并通过Transformer的自注意力机制对原型进行重新校准。随后,提出MFFM以融合不同尺度的聚类原型,该模块采用MLP-Mixer增强原型间的信息交互。在两个公开WSI数据集上的实验结果表明,所提出的MSPT优于所有对比算法,展现了其潜在的应用价值。