Despite the significant potential of Foundation Models (FMs) in medical imaging, their application to prognosis prediction remains challenging due to data scarcity, class imbalance, and task complexity, which limit their clinical adoption. This study introduces the first structured benchmark to assess the robustness and efficiency of transfer learning strategies for FMs compared with convolutional neural networks (CNNs) in predicting COVID-19 patient outcomes from chest X-rays. The goal is to systematically compare finetuning strategies, both classical and parameter efficient, under realistic clinical constraints related to data scarcity and class imbalance, offering empirical guidance for AI deployment in clinical workflows. Four publicly available COVID-19 chest X-ray datasets were used, covering mortality, severity, and ICU admission, with varying sample sizes and class imbalances. CNNs pretrained on ImageNet and FMs pretrained on general or biomedical datasets were adapted using full finetuning, linear probing, and parameter-efficient methods. Models were evaluated under full data and few shot regimes using the Matthews Correlation Coefficient (MCC) and Precision Recall AUC (PR-AUC), with cross validation and class weighted losses. CNNs with full fine-tuning performed robustly on small, imbalanced datasets, while FMs with Parameter-Efficient Fine-Tuning (PEFT), particularly LoRA and BitFit, achieved competitive results on larger datasets. Severe class imbalance degraded PEFT performance, whereas balanced data mitigated this effect. In few-shot settings, FMs showed limited generalization, with linear probing yielding the most stable results. No single fine-tuning strategy proved universally optimal: CNNs remain dependable for low-resource scenarios, whereas FMs benefit from parameter-efficient methods when data are sufficient.
翻译:尽管基础模型在医学影像领域具有巨大潜力,但由于数据稀缺、类别不平衡和任务复杂性等问题,其在预后预测中的应用仍面临挑战,限制了其临床采纳。本研究首次引入结构化基准,用于评估基础模型与卷积神经网络在基于胸部X光预测COVID-19患者预后时,迁移学习策略的鲁棒性与效率。目标是在数据稀缺和类别不平衡的现实临床约束下,系统比较经典微调与参数高效微调策略,为AI在临床工作流程中的部署提供实证指导。研究采用四个公开的COVID-19胸部X光数据集,涵盖死亡率、严重程度和ICU入院等预测任务,数据集具有不同的样本量和类别不平衡程度。基于ImageNet预训练的CNN模型,以及在通用或生物医学数据集上预训练的基础模型,分别通过全参数微调、线性探测和参数高效方法进行适配。模型在完整数据和少样本场景下,使用马修斯相关系数和精确率-召回率曲线下面积进行评估,并采用交叉验证和类别加权损失函数。结果显示:全参数微调的CNN模型在小型不平衡数据集上表现稳健,而采用参数高效微调(特别是LoRA和BitFit方法)的基础模型在较大数据集上取得了有竞争力的结果。严重的类别不平衡会降低参数高效微调的性能,而数据平衡可缓解此影响。在少样本场景中,基础模型的泛化能力有限,线性探测方法产生了最稳定的结果。研究表明,不存在普遍最优的微调策略:CNN在低资源场景中仍具可靠性,而基础模型在数据充足时可通过参数高效方法获益。