Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

Meng Wang,Tian Lin,Aidi Lin,Kai Yu,Yuanyuan Peng,Lianyu Wang,Cheng Chen,Ke Zou,Huiyu Liang,Man Chen,Xue Yao,Meiqin Zhang,Binwei Huang,Chaoxin Zheng,Peixin Zhang,Wei Chen,Yilong Luo,Yifan Chen,Honghe Xia,Tingkun Shi,Qi Zhang,Jinming Guo,Xiaolin Chen,Jingcheng Wang,Yih Chung Tham,Dianbo Liu,Wendy Wong,Sahil Thakur,Beau Fenner,Danqi Fang,Siying Liu,Qingyun Liu,Yuqiang Huang,Hongqiang Zeng,Yanda Meng,Yukun Zhou,Zehua Jiang,Minghui Qiu,Changqing Zhang,Xinjian Chen,Sophia Y Wang,Cecilia S Lee,Lucia Sobrin,Carol Y Cheung,Chi Pui Pang,Pearse A Keane,Ching-Yu Cheng,Haoyu Chen,Huazhu Fu

Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered.

翻译：既往的视网膜影像基础模型预训练所使用的疾病类别与知识库较为有限。本研究提出了RetiZero——一种利用超过400种眼底疾病知识构建的视觉-语言基础模型。为完成RetiZero的预训练，我们整合了来自公共数据集、眼科文献及网络资源的341,896张眼底图像及其对应文本描述，涵盖多民族、多国家背景下的多种疾病谱系。RetiZero在多项下游任务中展现出卓越性能，包括零样本疾病识别、图像到图像检索、域内及跨域疾病鉴别。在零样本场景下，RetiZero对15种眼底疾病的Top5准确率达到0.8430，对52种眼底疾病达到0.7561。在图像检索任务中，对相同疾病集合的Top5检索准确率分别达到0.9500和0.8860。临床评估表明，RetiZero的Top3零样本性能超越了来自新加坡、中国和美国19位眼科医生的平均诊断水平。此外，RetiZero显著提升了临床医生诊断眼底疾病的准确率。这些发现证实了将RetiZero基础模型整合到面临多种眼底疾病的临床环境中的价值。