Most advances in medical image recognition supporting clinical auxiliary diagnosis meet challenges due to the low-resource situation in the medical field, where annotations are highly expensive and professional. This low-resource problem can be alleviated by leveraging the transferable representations of large-scale pre-trained vision-language models via relevant medical text prompts. However, existing pre-trained vision-language models require domain experts to carefully design the medical prompts, which greatly increases the burden on clinicians. To address this problem, we propose a weakly supervised prompt learning method MedPrompt to automatically generate medical prompts, which includes an unsupervised pre-trained vision-language model and a weakly supervised prompt learning model. The unsupervised pre-trained vision-language model utilizes the natural correlation between medical images and corresponding medical texts for pre-training, without any manual annotations. The weakly supervised prompt learning model only utilizes the classes of images in the dataset to guide the learning of the specific class vector in the prompt, while the learning of other context vectors in the prompt requires no manual annotations for guidance. To the best of our knowledge, this is the first model to automatically generate medical prompts. With these prompts, the pre-trained vision-language model can be freed from the strong expert dependency of manual annotation and manual prompt design. Experimental results show that the model using our automatically generated prompts outperforms its full-shot learning hand-crafted prompts counterparts with only a minimal number of labeled samples for few-shot learning, and reaches superior or comparable accuracy on zero-shot image classification. The proposed prompt generator is lightweight and therefore can be embedded into any network architecture.
翻译:支持临床辅助诊断的医学图像识别领域大多进展因医学领域的低资源现状而面临挑战,其中标注成本高昂且需要专业知识。这一低资源问题可通过利用大规模预训练视觉-语言模型的可迁移表征,并借助相关医学文本提示得到缓解。然而,现有预训练视觉-语言模型需要领域专家精心设计医学提示,这极大增加了临床医生的负担。为解决该问题,我们提出一种弱监督提示学习方法MedPrompt,用于自动生成医学提示。该方法包含一个无监督预训练视觉-语言模型和一个弱监督提示学习模型。无监督预训练视觉-语言模型利用医学图像与对应医学文本之间的自然关联进行预训练,无需任何人工标注。弱监督提示学习模型仅利用数据集中图像的类别来指导提示中特定类别向量的学习,而提示中其他上下文向量的学习无需人工标注指导。据我们所知,这是首个自动生成医学提示的模型。借助这些提示,预训练视觉-语言模型可摆脱对人工标注和手动提示设计的强专家依赖性。实验结果表明,在少样本学习场景中,使用我们自动生成提示的模型仅需极少量标注样本,其性能便优于使用全样本手动设计提示的同类模型,并在零样本图像分类中达到更优或可比精度。所提出的提示生成器轻量级,因此可嵌入任意网络架构。