Multimodal Federated Learning (MMFL) utilizes multiple modalities in each client to build a more powerful Federated Learning (FL) model than its unimodal counterpart. However, the impact of missing modality in different clients, also called modality incongruity, has been greatly overlooked. This paper, for the first time, analyses the impact of modality incongruity and reveals its connection with data heterogeneity across participating clients. We particularly inspect whether incongruent MMFL with unimodal and multimodal clients is more beneficial than unimodal FL. Furthermore, we examine three potential routes of addressing this issue. Firstly, we study the effectiveness of various self-attention mechanisms towards incongruity-agnostic information fusion in MMFL. Secondly, we introduce a modality imputation network (MIN) pre-trained in a multimodal client for modality translation in unimodal clients and investigate its potential towards mitigating the missing modality problem. Thirdly, we assess the capability of client-level and server-level regularization techniques towards mitigating modality incongruity effects. Experiments are conducted under several MMFL settings on two publicly available real-world datasets, MIMIC-CXR and Open-I, with Chest X-Ray and radiology reports.
翻译:多模态联邦学习(MMFL)通过在每个客户端利用多种模态数据,构建比单模态联邦学习(FL)更强大的模型。然而,不同客户端中模态缺失(即模态不一致性)的影响在很大程度上被忽视。本文首次分析了模态不一致性的影响,揭示其与参与客户端间数据异质性的关联。我们特别检验了包含单模态和多模态客户端的不一致MMFL是否比单模态FL更具优势。此外,我们探讨了解决该问题的三种潜在途径:首先,研究MMFL中面向不一致性无关信息融合的各类自注意力机制的有效性;其次,提出在多模态客户端预训练的模态插补网络(MIN),用于单模态客户端的模态转换,并探究其缓解模态缺失问题的潜力;第三,评估客户端级和服务器级正则化技术对减轻模态不一致性影响的能力。我们在两个公开真实数据集(MIMIC-CXR和Open-I)的胸片与放射报告上,于多种MMFL设置下开展实验。