We present a comprehensive evaluation of Parameter-Efficient Fine-Tuning (PEFT) techniques for diverse medical image analysis tasks. PEFT is increasingly exploited as a valuable approach for knowledge transfer from pre-trained models in natural language processing, vision, speech, and cross-modal tasks, such as vision-language and text-to-image generation. However, its application in medical image analysis remains relatively unexplored. As foundation models are increasingly exploited in the medical domain, it is crucial to investigate and comparatively assess various strategies for knowledge transfer that can bolster a range of downstream tasks. Our study, the first of its kind (to the best of our knowledge), evaluates 16 distinct PEFT methodologies proposed for convolutional and transformer-based networks, focusing on image classification and text-to-image generation tasks across six medical datasets ranging in size, modality, and complexity. Through a battery of more than 600 controlled experiments, we demonstrate performance gains of up to 22% under certain scenarios and demonstrate the efficacy of PEFT for medical text-to-image generation. Further, we reveal the instances where PEFT methods particularly dominate over conventional fine-tuning approaches by studying their relationship with downstream data volume.
翻译:本文对参数高效微调(PEFT)技术在多种医学图像分析任务中进行了全面评估。PEFT已成为自然语言处理、视觉、语音以及跨模态任务(如视觉语言和文本到图像生成)中,从预训练模型进行知识迁移的重要方法。然而,其在医学图像分析领域的应用仍相对未被探索。随着基础模型在医学领域的广泛应用,亟需研究并比较评估多种知识迁移策略,以支持各类下游任务。本研究(据我们所知为首例)评估了16种针对卷积网络和Transformer网络提出的PEFT方法,聚焦于图像分类和文本到图像生成任务,涵盖六个在规模、模态和复杂度上各异的医学数据集。通过超过600项受控实验,我们证明了在特定场景下性能提升可达22%,并展示了PEFT在医学文本到图像生成中的有效性。此外,通过研究PEFT方法与下游数据量的关系,我们揭示了其在哪些情况下显著优于传统微调方法。