Current multi-instance learning algorithms for pathology image analysis often require a substantial number of Whole Slide Images for effective training but exhibit suboptimal performance in scenarios with limited learning data. In clinical settings, restricted access to pathology slides is inevitable due to patient privacy concerns and the prevalence of rare or emerging diseases. The emergence of the Few-shot Weakly Supervised WSI Classification accommodates the significant challenge of the limited slide data and sparse slide-level labels for diagnosis. Prompt learning based on the pre-trained models (\eg, CLIP) appears to be a promising scheme for this setting; however, current research in this area is limited, and existing algorithms often focus solely on patch-level prompts or confine themselves to language prompts. This paper proposes a multi-instance prompt learning framework enhanced with pathology knowledge, \ie, integrating visual and textual prior knowledge into prompts at both patch and slide levels. The training process employs a combination of static and learnable prompts, effectively guiding the activation of pre-trained models and further facilitating the diagnosis of key pathology patterns. Lightweight Messenger (self-attention) and Summary (attention-pooling) layers are introduced to model relationships between patches and slides within the same patient data. Additionally, alignment-wise contrastive losses ensure the feature-level alignment between visual and textual learnable prompts for both patches and slides. Our method demonstrates superior performance in three challenging clinical tasks, significantly outperforming comparative few-shot methods.
翻译:当前用于病理图像分析的多示例学习算法通常需要大量全切片图像进行有效训练,但在学习数据有限的场景中表现欠佳。在临床环境中,由于患者隐私问题以及罕见或新发疾病的普遍存在,对病理切片的访问受限是不可避免的。少样本弱监督全切片图像分类的出现,为应对诊断中切片数据有限和切片级标签稀疏这一重大挑战提供了解决方案。基于预训练模型(例如CLIP)的提示学习似乎是适用于此场景的一种有前景的方案;然而,目前该领域的研究有限,现有算法往往仅关注补丁级提示或局限于语言提示。本文提出一种病理学知识增强的多示例提示学习框架,即:将视觉和文本先验知识整合到补丁和切片两个级别的提示中。训练过程采用静态提示与可学习提示相结合的方式,有效引导预训练模型的激活,并进一步促进关键病理模式的诊断。我们引入了轻量级的信使层(自注意力)和摘要层(注意力池化)来建模同一患者数据内补丁与切片之间的关系。此外,对齐感知的对比损失确保了补丁和切片在视觉与文本可学习提示之间的特征级对齐。我们的方法在三个具有挑战性的临床任务中展现出优越性能,显著优于对比的少样本方法。