Existing multi-modal learning methods on fundus and OCT images mostly require both modalities to be available and strictly paired for training and testing, which appears less practical in clinical scenarios. To expand the scope of clinical applications, we formulate a novel setting, "OCT-enhanced disease recognition from fundus images", that allows for the use of unpaired multi-modal data during the training phase and relies on the widespread fundus photographs for testing. To benchmark this setting, we present the first large multi-modal multi-class dataset for eye disease diagnosis, MultiEYE, and propose an OCT-assisted Conceptual Distillation Approach (OCT-CoDA), which employs semantically rich concepts to extract disease-related knowledge from OCT images and leverage them into the fundus model. Specifically, we regard the image-concept relation as a link to distill useful knowledge from the OCT teacher model to the fundus student model, which considerably improves the diagnostic performance based on fundus images and formulates the cross-modal knowledge transfer into an explainable process. Through extensive experiments on the multi-disease classification task, our proposed OCT-CoDA demonstrates remarkable results and interpretability, showing great potential for clinical application. Our dataset and code are available at https://github.com/xmed-lab/MultiEYE.
翻译:现有的基于眼底图像和OCT图像的多模态学习方法大多要求训练和测试时两种模态数据均可用且严格配对,这在临床场景中实用性较低。为拓展临床应用范围,我们提出了一种新颖的设置——“基于眼底图像的OCT增强疾病识别”,该设置允许在训练阶段使用非配对的多模态数据,并在测试时仅依赖广泛可用的眼底照片。为建立该设置的基准,我们首次提出了用于眼病诊断的大规模多模态多类别数据集MultiEYE,并提出了一种OCT辅助概念蒸馏方法(OCT-CoDA)。该方法利用语义丰富的概念从OCT图像中提取疾病相关知识,并将其迁移至眼底模型中。具体而言,我们将图像-概念关系视为一种连接,用于从OCT教师模型向眼底学生模型蒸馏有用知识,这显著提升了基于眼底图像的诊断性能,并将跨模态知识转移过程构建为可解释的流程。通过对多疾病分类任务的大量实验,我们提出的OCT-CoDA展现出卓越的结果与可解释性,显示出巨大的临床应用潜力。我们的数据集与代码已公开于 https://github.com/xmed-lab/MultiEYE。