In this paper, we propose a novel approach (called GPT4MIA) that utilizes Generative Pre-trained Transformer (GPT) as a plug-and-play transductive inference tool for medical image analysis (MIA). We provide theoretical analysis on why a large pre-trained language model such as GPT-3 can be used as a plug-and-play transductive inference model for MIA. At the methodological level, we develop several technical treatments to improve the efficiency and effectiveness of GPT4MIA, including better prompt structure design, sample selection, and prompt ordering of representative samples/features. We present two concrete use cases (with workflow) of GPT4MIA: (1) detecting prediction errors and (2) improving prediction accuracy, working in conjecture with well-established vision-based models for image classification (e.g., ResNet). Experiments validate that our proposed method is effective for these two tasks. We further discuss the opportunities and challenges in utilizing Transformer-based large language models for broader MIA applications.
翻译:本文提出了一种名为GPT4MIA的新方法,该方法利用生成式预训练Transformer(GPT)作为即插即用的直推式推理工具,用于医学图像分析。我们提供了理论分析,阐述了为何像GPT-3这样的大型预训练语言模型可以作为MIA的即插即用直推式推理模型。在方法论层面,我们开发了多项技术处理手段以提升GPT4MIA的效率和有效性,包括优化提示结构设计、样本选择以及代表性样本/特征的提示排序。我们展示了GPT4MIA的两个具体应用案例(附工作流程):(1)检测预测错误和(2)提高预测准确性,这些应用与用于图像分类(如ResNet)的成熟视觉模型协同工作。实验验证了所提方法在这两项任务中的有效性。我们进一步讨论了利用基于Transformer的大型语言模型在更广泛MIA应用中的机遇与挑战。