Is an Ultra Large Natural Image-Based Foundation Model Superior to a Retina-Specific Model for Detecting Ocular and Systemic Diseases?

Qingshan Hou,Yukun Zhou,Jocelyn Hui Lin Goh,Ke Zou,Samantha Min Er Yew,Sahana Srinivasan,Meng Wang,Thaddaeus Lo,Xiaofeng Lei,Siegfried K. Wagner,Mark A. Chia,Dawei Yang,Hongyang Jiang,AnRan Ran,Rui Santos,Gabor Mark Somfai,Juan Helen Zhou,Haoyu Chen,Qingyu Chen,Carol Yim-Lui Cheung,Pearse A. Keane,Yih Chung Tham

The advent of foundation models (FMs) is transforming medical domain. In ophthalmology, RETFound, a retina-specific FM pre-trained sequentially on 1.4 million natural images and 1.6 million retinal images, has demonstrated high adaptability across clinical applications. Conversely, DINOv2, a general-purpose vision FM pre-trained on 142 million natural images, has shown promise in non-medical domains. However, its applicability to clinical tasks remains underexplored. To address this, we conducted head-to-head evaluations by fine-tuning RETFound and three DINOv2 models (large, base, small) for ocular disease detection and systemic disease prediction tasks, across eight standardized open-source ocular datasets, as well as the Moorfields AlzEye and the UK Biobank datasets. DINOv2-large model outperformed RETFound in detecting diabetic retinopathy (AUROC=0.850-0.952 vs 0.823-0.944, across three datasets, all P<=0.007) and multi-class eye diseases (AUROC=0.892 vs. 0.846, P<0.001). In glaucoma, DINOv2-base model outperformed RETFound (AUROC=0.958 vs 0.940, P<0.001). Conversely, RETFound achieved superior performance over all DINOv2 models in predicting heart failure, myocardial infarction, and ischaemic stroke (AUROC=0.732-0.796 vs 0.663-0.771, all P<0.001). These trends persisted even with 10% of the fine-tuning data. These findings showcase the distinct scenarios where general-purpose and domain-specific FMs excel, highlighting the importance of aligning FM selection with task-specific requirements to optimise clinical performance.

翻译：基础模型的出现正在变革医学领域。在眼科学中，RETFound——一种通过在140万张自然图像和160万张视网膜图像上顺序预训练的视网膜专用基础模型——已在临床应用场景中展现出高度适应性。相反，DINOv2——一种基于1.42亿张自然图像预训练的通用视觉基础模型——在非医学领域已显示出潜力，但其在临床任务中的适用性仍待深入探索。为此，我们通过对RETFound和三个DINOv2模型（大、基、小）进行微调，在八个标准化开源眼科数据集以及Moorfields AlzEye和UK Biobank数据集上，针对眼部疾病检测和全身性疾病预测任务进行了直接比较评估。DINOv2大模型在检测糖尿病视网膜病变（AUROC=0.850-0.952 vs 0.823-0.944，跨越三个数据集，所有P≤0.007）和多类别眼病（AUROC=0.892 vs. 0.846，P<0.001）方面优于RETFound。在青光眼检测中，DINOv2基模型表现优于RETFound（AUROC=0.958 vs 0.940，P<0.001）。相反，在预测心力衰竭、心肌梗死和缺血性卒中方面，RETFound在所有DINOv2模型中取得了更优性能（AUROC=0.732-0.796 vs 0.663-0.771，所有P<0.001）。即使仅使用10%的微调数据，这些趋势依然保持。这些发现揭示了通用型与领域专用基础模型各自优势的应用场景，强调了根据任务特定需求选择基础模型以优化临床性能的重要性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日