Retinal diseases spanning a broad spectrum can be effectively identified and diagnosed using complementary signals from multimodal data. However, multimodal diagnosis in ophthalmic practice is typically challenged in terms of data heterogeneity, potential invasiveness, registration complexity, and so on. As such, a unified framework that integrates multimodal data synthesis and fusion is proposed for retinal disease classification and grading. Specifically, the synthesized multimodal data incorporates fundus fluorescein angiography (FFA), multispectral imaging (MSI), and saliency maps that emphasize latent lesions as well as optic disc/cup regions. Parallel models are independently trained to learn modality-specific representations that capture cross-pathophysiological signatures. These features are then adaptively calibrated within and across modalities to perform information pruning and flexible integration according to downstream tasks. The proposed learning system is thoroughly interpreted through visualizations in both image and feature spaces. Extensive experiments on two public datasets demonstrated the superiority of our approach over state-of-the-art ones in the tasks of multi-label classification (F1-score: 0.683, AUC: 0.953) and diabetic retinopathy grading (Accuracy:0.842, Kappa: 0.861). This work not only enhances the accuracy and efficiency of retinal disease screening but also offers a scalable framework for data augmentation across various medical imaging modalities.
翻译:利用多模态数据的互补信号可以有效识别和诊断涵盖广泛谱系的视网膜疾病。然而,眼科实践中的多模态诊断通常面临数据异质性、潜在侵入性、配准复杂性等挑战。为此,本文提出一个整合多模态数据合成与融合的统一框架,用于视网膜疾病的分类与分级。具体而言,合成的多模态数据包含眼底荧光素血管造影(FFA)、多光谱成像(MSI)以及强调潜在病灶及视盘/视杯区域的显著图。并行模型被独立训练以学习捕捉跨病理生理学特征的模态特异性表征。这些特征随后在模态内部和模态之间进行自适应校准,以根据下游任务执行信息剪枝和灵活整合。所提出的学习系统通过图像空间和特征空间的可视化得到了全面阐释。在两个公开数据集上的大量实验表明,我们的方法在多标签分类(F1分数:0.683,AUC:0.953)和糖尿病视网膜病变分级(准确率:0.842,Kappa:0.861)任务中优于现有最先进方法。这项工作不仅提高了视网膜疾病筛查的准确性和效率,还为跨多种医学成像模态的数据增强提供了一个可扩展的框架。