The advent of foundation models (FMs) is transforming medical domain. In ophthalmology, RETFound, a retina-specific FM pre-trained sequentially on 1.4 million natural images and 1.6 million retinal images, has demonstrated high adaptability across clinical applications. Conversely, DINOv2, a general-purpose vision FM pre-trained on 142 million natural images, has shown promise in non-medical domains. However, its applicability to clinical tasks remains underexplored. To address this, we conducted head-to-head evaluations by fine-tuning RETFound and three DINOv2 models (large, base, small) for ocular disease detection and systemic disease prediction tasks, across eight standardized open-source ocular datasets, as well as the Moorfields AlzEye and the UK Biobank datasets. DINOv2-large model outperformed RETFound in detecting diabetic retinopathy (AUROC=0.850-0.952 vs 0.823-0.944, across three datasets, all P<=0.007) and multi-class eye diseases (AUROC=0.892 vs. 0.846, P<0.001). In glaucoma, DINOv2-base model outperformed RETFound (AUROC=0.958 vs 0.940, P<0.001). Conversely, RETFound achieved superior performance over all DINOv2 models in predicting heart failure, myocardial infarction, and ischaemic stroke (AUROC=0.732-0.796 vs 0.663-0.771, all P<0.001). These trends persisted even with 10% of the fine-tuning data. These findings showcase the distinct scenarios where general-purpose and domain-specific FMs excel, highlighting the importance of aligning FM selection with task-specific requirements to optimise clinical performance.
翻译:基础模型的出现正在变革医学领域。在眼科学中,RETFound——一种通过在140万张自然图像和160万张视网膜图像上顺序预训练的视网膜专用基础模型——已在临床应用场景中展现出高度适应性。相反,DINOv2——一种基于1.42亿张自然图像预训练的通用视觉基础模型——在非医学领域已显示出潜力,但其在临床任务中的适用性仍待深入探索。为此,我们通过对RETFound和三个DINOv2模型(大、基、小)进行微调,在八个标准化开源眼科数据集以及Moorfields AlzEye和UK Biobank数据集上,针对眼部疾病检测和全身性疾病预测任务进行了直接比较评估。DINOv2大模型在检测糖尿病视网膜病变(AUROC=0.850-0.952 vs 0.823-0.944,跨越三个数据集,所有P≤0.007)和多类别眼病(AUROC=0.892 vs. 0.846,P<0.001)方面优于RETFound。在青光眼检测中,DINOv2基模型表现优于RETFound(AUROC=0.958 vs 0.940,P<0.001)。相反,在预测心力衰竭、心肌梗死和缺血性卒中方面,RETFound在所有DINOv2模型中取得了更优性能(AUROC=0.732-0.796 vs 0.663-0.771,所有P<0.001)。即使仅使用10%的微调数据,这些趋势依然保持。这些发现揭示了通用型与领域专用基础模型各自优势的应用场景,强调了根据任务特定需求选择基础模型以优化临床性能的重要性。