Whole slide image (WSI) assessment is a challenging and crucial step in cancer diagnosis and treatment planning. WSIs require high magnifications to facilitate sub-cellular analysis. Precise annotations for patch- or even pixel-level classifications in the context of gigapixel WSIs are tedious to acquire and require domain experts. Coarse-grained labels, on the other hand, are easily accessible, which makes WSI classification an ideal use case for multiple instance learning (MIL). In our work, we propose a novel embedding-based Dual-Query MIL pipeline (DQ-MIL). We contribute to both the embedding and aggregation steps. Since all-purpose visual feature representations are not yet available, embedding models are currently limited in terms of generalizability. With our work, we explore the potential of dynamic meta-embedding based on cutting-edge self-supervised pre-trained models in the context of MIL. Moreover, we propose a new MIL architecture capable of combining MIL-attention with correlated self-attention. The Dual-Query Perceiver design of our approach allows us to leverage the concept of self-distillation and to combine the advantages of a small model in the context of a low data regime with the rich feature representation of a larger model. We demonstrate the superior performance of our approach on three histopathological datasets, where we show improvement of up to 10% over state-of-the-art approaches.
翻译:全切片图像评估是癌症诊断和治疗规划中具有挑战性的关键步骤。全切片图像需要高放大倍数以支持亚细胞层面分析。在十亿像素级别的全切片图像中,获取用于补丁级甚至像素级分类的精确标注既繁琐又需要领域专家参与。而粗粒度标签则易于获取,这使得全切片图像分类成为多实例学习的理想应用场景。本研究提出了一种新颖的基于嵌入的双查询多实例学习流水线。我们同时对嵌入和聚合步骤进行了创新贡献。由于通用视觉特征表示尚未实现,当前嵌入模型在泛化能力方面存在局限性。通过本研究,我们探索了基于前沿自监督预训练模型的动态元嵌入在多实例学习场景中的潜力。此外,我们提出了一种新的多实例学习架构,能够将多实例学习注意力机制与相关自注意力机制相结合。我们方法中的双查询感知器设计允许我们利用自蒸馏概念,在低数据量场景下结合小模型的优势与大模型丰富的特征表示能力。我们在三个组织病理学数据集上展示了本方法的优越性能,相较于现有最先进方法实现了高达10%的性能提升。