Whole slide image (WSI) assessment is a challenging and crucial step in cancer diagnosis and treatment planning. WSIs require high magnifications to facilitate sub-cellular analysis. Precise annotations for patch- or even pixel-level classifications in the context of gigapixel WSIs are tedious to acquire and require domain experts. Coarse-grained labels, on the other hand, are easily accessible, which makes WSI classification an ideal use case for multiple instance learning (MIL). In our work, we propose a novel embedding-based Dual-Query MIL pipeline (DQ-MIL). We contribute to both the embedding and aggregation steps. Since all-purpose visual feature representations are not yet available, embedding models are currently limited in terms of generalizability. With our work, we explore the potential of dynamic meta-embedding based on cutting-edge self-supervised pre-trained models in the context of MIL. Moreover, we propose a new MIL architecture capable of combining MIL-attention with correlated self-attention. The Dual-Query Perceiver design of our approach allows us to leverage the concept of self-distillation and to combine the advantages of a small model in the context of a low data regime with the rich feature representation of a larger model. We demonstrate the superior performance of our approach on three histopathological datasets, where we show improvement of up to 10% over state-of-the-art approaches.
翻译:全切片图像(WSI)评估是癌症诊断和治疗规划中关键且具有挑战性的步骤。WSI需要高放大倍率以支持亚细胞层面分析。在千兆像素级WSI背景下,为斑块级乃至像素级分类任务获取精确标注既繁琐又需要领域专家参与。相比之下,粗粒度标签更易获取,这使得WSI分类成为多实例学习(MIL)的理想应用场景。本研究提出一种新型基于嵌入的双查询MIL流水线(DQ-MIL)。我们在嵌入和聚合两个步骤中均做出贡献。由于通用视觉特征表示尚不可用,当前嵌入模型在泛化能力方面存在局限。本研究探索了基于前沿自监督预训练模型的动态元嵌入在MIL中的应用潜力。此外,我们提出一种新型MIL架构,能够将MIL注意力机制与相关自注意力机制相结合。本方法的双查询感知器设计使我们能够利用自蒸馏概念,在低数据规模下兼顾小模型的优势与大模型丰富的特征表示能力。我们在三个组织病理学数据集上验证了本方法的卓越性能,相比现有最先进方法实现了最高10%的性能提升。