The use of multimodal imaging has led to significant improvements in the diagnosis and treatment of many diseases. Similar to clinical practice, some works have demonstrated the benefits of multimodal fusion for automatic segmentation and classification using deep learning-based methods. However, current segmentation methods are limited to fusion of modalities with the same dimensionality (e.g., 3D+3D, 2D+2D), which is not always possible, and the fusion strategies implemented by classification methods are incompatible with localization tasks. In this work, we propose a novel deep learning-based framework for the fusion of multimodal data with heterogeneous dimensionality (e.g., 3D+2D) that is compatible with localization tasks. The proposed framework extracts the features of the different modalities and projects them into the common feature subspace. The projected features are then fused and further processed to obtain the final prediction. The framework was validated on the following tasks: segmentation of geographic atrophy (GA), a late-stage manifestation of age-related macular degeneration, and segmentation of retinal blood vessels (RBV) in multimodal retinal imaging. Our results show that the proposed method outperforms the state-of-the-art monomodal methods on GA and RBV segmentation by up to 3.10% and 4.64% Dice, respectively.
翻译:多模态成像的应用已显著提升多种疾病的诊断与治疗水平。与临床实践类似,部分研究已证实基于深度学习方法的自动分割与分类任务中多模态融合的益处。然而,当前分割方法局限于同维度模态的融合(例如3D+3D、2D+2D),这在实践中并非总可实现,且分类方法所实施的融合策略不适用于定位任务。本研究提出一种新型深度学习框架,用于实现与定位任务兼容的异构维度多模态数据(如3D+2D)融合。该框架提取不同模态的特征并将其投影至共同特征子空间,进而融合投影特征并作进一步处理以获得最终预测结果。该框架在以下任务中进行了验证:地理萎缩(年龄相关性黄斑变性晚期表现)的分割,以及视网膜多模态影像中视网膜血管的分割。结果表明,所提方法在地理萎缩与视网膜血管分割任务中,分别比最先进的单模态方法在Dice系数上提升了3.10%和4.64%。