In the field of medical imaging, there are seldom large-scale public datasets with high-quality annotations due to data privacy and annotation cost. To address this issue, we release SynFundus-1M, a high-quality synthetic dataset containing over \textbf{1 million} fundus images w.r.t. 11 disease types. Moreover, we intentionally diversify the readability of the images and accordingly provide 4 types of the quality score for each image. To the best of our knowledge, SynFundus-1M is currently the largest fundus dataset with the most sophisticated annotations. All the images are generated by a Denoising Diffusion Probabilistic Model, named SynFundus-Generator. Trained with over 1.3 million private fundus images, our SynFundus-Generator achieves significant superior performance in generating fundus images compared to some recent related works. Furthermore, we blend some synthetic images from SynFundus-1M with real fundus images, and ophthalmologists can hardly distinguish the synthetic images from real ones. Through extensive experiments, we demonstrate that both convolutional neural networs (CNN) and Vision Transformer (ViT) can benefit from SynFundus-1M by pretraining or training directly. Compared to datasets like ImageNet or EyePACS, models trained on SynFundus-1M not only achieve better performance but also faster convergence on various downstream tasks.
翻译:在医学影像领域,由于数据隐私和标注成本等问题,很少存在大规模且具有高质量标注的公开数据集。为解决这一问题,我们发布了SynFundus-1M,这是一个包含超过**100万**张眼底图像、涵盖11种疾病类型的高质量合成数据集。此外,我们有意使图像的可读性多样化,并为每张图像提供了4种质量评分。据我们所知,SynFundus-1M是目前规模最大且标注最为精细的眼底数据集。所有图像均由名为SynFundus-Generator的去噪扩散概率模型生成。该模型基于超过130万张私有眼底图像进行训练,在生成眼底图像方面,其性能显著优于近期相关研究工作。进一步地,我们将SynFundus-1M中的部分合成图像与真实眼底图像混合,眼科医生几乎无法区分合成图像与真实图像。通过大量实验,我们证明无论是卷积神经网络(CNN)还是Vision Transformer(ViT),都能通过预训练或直接训练的方式从SynFundus-1M中获益。与ImageNet或EyePACS等数据集相比,基于SynFundus-1M训练的模型不仅在各类下游任务中取得了更优的性能,而且收敛速度也更快。