Missing-modality Enabled Multi-modal Fusion Architecture for Medical Data

Fusing multi-modal data can improve the performance of deep learning models. However, missing modalities are common for medical data due to patients' specificity, which is detrimental to the performance of multi-modal models in applications. Therefore, it is critical to adapt the models to missing modalities. This study aimed to develop an efficient multi-modal fusion architecture for medical data that was robust to missing modalities and further improved the performance on disease diagnosis.X-ray chest radiographs for the image modality, radiology reports for the text modality, and structured value data for the tabular data modality were fused in this study. Each modality pair was fused with a Transformer-based bi-modal fusion module, and the three bi-modal fusion modules were then combined into a tri-modal fusion framework. Additionally, multivariate loss functions were introduced into the training process to improve model's robustness to missing modalities in the inference process. Finally, we designed comparison and ablation experiments for validating the effectiveness of the fusion, the robustness to missing modalities and the enhancements from each key component. Experiments were conducted on MIMIC-IV, MIMIC-CXR with the 14-label disease diagnosis task. Areas under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC) were used to evaluate models' performance. The experimental results demonstrated that our proposed multi-modal fusion architecture effectively fused three modalities and showed strong robustness to missing modalities. This method is hopeful to be scaled to more modalities to enhance the clinical practicality of the model.

翻译：融合多模态数据能够提升深度学习模型的性能。然而，由于患者的特异性，医学数据中常出现模态缺失现象，这在实际应用中对多模态模型的性能造成不利影响。因此，使模型适应模态缺失至关重要。本研究旨在开发一种对模态缺失具有鲁棒性、并能进一步提升疾病诊断性能的高效多模态融合架构。研究中融合了图像模态的X光胸片、文本模态的放射学报告以及表格数据模态的结构化数值数据。每对模态通过基于Transformer的双模态融合模块进行融合，三个双模态融合模块继而组合成一个三模态融合框架。此外，在训练过程中引入多变量损失函数，以提升模型在推理过程中对模态缺失的鲁棒性。最终，我们设计了对比实验和消融实验，以验证融合的有效性、对模态缺失的鲁棒性以及每个关键组件的增强效果。实验基于MIMIC-IV和MIMIC-CXR数据集，在14标签疾病诊断任务上进行。采用受试者工作特征曲线下面积（AUROC）和精确率-召回率曲线下面积（AUPRC）评估模型性能。实验结果表明，我们提出的多模态融合架构有效融合了三种模态，并对模态缺失展现出强鲁棒性。该方法有望扩展至更多模态，以增强模型的临床实用性。