In this paper, we propose a novel approach (called GPT4MIA) that utilizes Generative Pre-trained Transformer (GPT) as a plug-and-play transductive inference tool for medical image analysis (MIA). We provide theoretical analysis on why a large pre-trained language model such as GPT-3 can be used as a plug-and-play transductive inference model for MIA. At the methodological level, we develop several technical treatments to improve the efficiency and effectiveness of GPT4MIA, including better prompt structure design, sample selection, and prompt ordering of representative samples/features. We present two concrete use cases (with workflow) of GPT4MIA: (1) detecting prediction errors and (2) improving prediction accuracy, working in conjecture with well-established vision-based models for image classification (e.g., ResNet). Experiments validate that our proposed method is effective for these two tasks. We further discuss the opportunities and challenges in utilizing Transformer-based large language models for broader MIA applications.
翻译:本文提出了一种名为GPT4MIA的新方法,利用生成式预训练Transformer(GPT)作为即插即用的转导推断工具,用于医学图像分析(MIA)。我们从理论层面分析了为何像GPT-3这样的大型预训练语言模型能够作为即插即用的转导推断模型应用于MIA。在方法论层面,我们开发了多项技术处理手段以提升GPT4MIA的效率与效果,包括改进提示结构设计、样本选择以及代表性样本/特征的提示排序。我们展示了GPT4MIA的两个具体应用案例(含工作流程):(1)检测预测错误和(2)提高预测准确性,并与成熟的基于视觉的图像分类模型(如ResNet)协同工作。实验验证了所提方法在这两项任务中的有效性。我们进一步探讨了将基于Transformer的大型语言模型应用于更广泛MIA领域的机遇与挑战。