The Rise of AI Language Pathologists: Exploring Two-level Prompt Learning for Few-shot Weakly-supervised Whole Slide Image Classification

This paper introduces the novel concept of few-shot weakly supervised learning for pathology Whole Slide Image (WSI) classification, denoted as FSWC. A solution is proposed based on prompt learning and the utilization of a large language model, GPT-4. Since a WSI is too large and needs to be divided into patches for processing, WSI classification is commonly approached as a Multiple Instance Learning (MIL) problem. In this context, each WSI is considered a bag, and the obtained patches are treated as instances. The objective of FSWC is to classify both bags and instances with only a limited number of labeled bags. Unlike conventional few-shot learning problems, FSWC poses additional challenges due to its weak bag labels within the MIL framework. Drawing inspiration from the recent achievements of vision-language models (V-L models) in downstream few-shot classification tasks, we propose a two-level prompt learning MIL framework tailored for pathology, incorporating language prior knowledge. Specifically, we leverage CLIP to extract instance features for each patch, and introduce a prompt-guided pooling strategy to aggregate these instance features into a bag feature. Subsequently, we employ a small number of labeled bags to facilitate few-shot prompt learning based on the bag features. Our approach incorporates the utilization of GPT-4 in a question-and-answer mode to obtain language prior knowledge at both the instance and bag levels, which are then integrated into the instance and bag level language prompts. Additionally, a learnable component of the language prompts is trained using the available few-shot labeled data. We conduct extensive experiments on three real WSI datasets encompassing breast cancer, lung cancer, and cervical cancer, demonstrating the notable performance of the proposed method in bag and instance classification. All codes will be made publicly accessible.

翻译：本文提出了病理全切片图像分类中弱监督小样本学习的新概念，记为FSWC。基于提示学习和大型语言模型GPT-4的应用，我们提出了一种解决方案。由于全切片图像尺寸过大，需分割为图像块进行处理，因此全切片图像分类通常被视为多实例学习问题。在此框架下，每个全切片图像被视作一个包，提取的图像块则视为实例。FSWC的目标是在仅拥有少量标注包的情况下，同时实现包和实例的分类。与传统小样本学习问题不同，FSWC因多实例学习框架中弱包标签的存在，带来了额外挑战。受视觉语言模型在下游小样本分类任务中取得的最新成果启发，我们提出了一种专为病理学设计的两级提示学习多实例学习框架，融合语言先验知识。具体而言，我们利用CLIP提取每个图像块的实例特征，并引入提示引导的池化策略将这些实例特征聚合成包特征。随后，我们利用少量标注包，基于包特征进行小样本提示学习。我们的方法采用GPT-4的问答模式，在实例级和包级获取语言先验知识，并将其融入实例级和包级语言提示中。此外，语言提示中的可学习组件通过现有的少量标注数据进行训练。我们在包含乳腺癌、肺癌和宫颈癌的三个真实全切片图像数据集上进行了大量实验，结果表明所提方法在包和实例分类中均表现出显著性能。所有代码将公开提供。