We present a comprehensive evaluation of Parameter-Efficient Fine-Tuning (PEFT) techniques for diverse medical image analysis tasks. PEFT is increasingly exploited as a valuable approach for knowledge transfer from pre-trained models in natural language processing, vision, speech, and cross-modal tasks, such as vision-language and text-to-image generation. However, its application in medical image analysis remains relatively unexplored. As foundation models are increasingly exploited in the medical domain, it is crucial to investigate and comparatively assess various strategies for knowledge transfer that can bolster a range of downstream tasks. Our study, the first of its kind (to the best of our knowledge), evaluates 16 distinct PEFT methodologies proposed for convolutional and transformer-based networks, focusing on image classification and text-to-image generation tasks across six medical datasets ranging in size, modality, and complexity. Through a battery of more than 600 controlled experiments, we demonstrate performance gains of up to 22% under certain scenarios and demonstrate the efficacy of PEFT for medical text-to-image generation. Further, we reveal the instances where PEFT methods particularly dominate over conventional fine-tuning approaches by studying their relationship with downstream data volume.
翻译:我们对医学图像分析多种任务中的参数高效微调(PEFT)技术进行了全面评估。PEFT在自然语言处理、视觉、语音以及跨模态任务(如视觉-语言和文本到图像生成)中日益被用作从预训练模型迁移知识的重要方法。然而,其在医学图像分析中的应用仍相对未被探索。随着基础模型在医学领域的广泛应用,亟需研究并比较评估各种能够支撑下游任务的知识迁移策略。本研究(据我们所知属首创)系统评估了16种针对卷积网络和Transformer网络提出的PEFT方法,重点涵盖图像分类和文本到图像生成任务,并在六个规模、模态和复杂度各异的医学数据集上展开测试。通过超过600组受控实验,我们证明了在某些场景下性能提升可达22%,并验证了PEFT在医学文本到图像生成中的有效性。此外,通过探究PEFT方法与下游数据量的关联,我们揭示了其在特定场景下显著优于传统微调方法的实例。