The fusion of complementary multimodal information is crucial in computational pathology for accurate diagnostics. However, existing multimodal learning approaches necessitate access to users' raw data, posing substantial privacy risks. While Federated Learning (FL) serves as a privacy-preserving alternative, it falls short in addressing the challenges posed by heterogeneous (yet possibly overlapped) modalities data across various hospitals. To bridge this gap, we propose a Federated Multi-Modal (FedMM) learning framework that federatedly trains multiple single-modal feature extractors to enhance subsequent classification performance instead of existing FL that aims to train a unified multimodal fusion model. Any participating hospital, even with small-scale datasets or limited devices, can leverage these federated trained extractors to perform local downstream tasks (e.g., classification) while ensuring data privacy. Through comprehensive evaluations of two publicly available datasets, we demonstrate that FedMM notably outperforms two baselines in accuracy and AUC metrics.
翻译:摘要:在计算病理学中,融合互补的多模态信息对于精确诊断至关重要。然而,现有的多模态学习方法需要访问用户的原始数据,存在严重的隐私风险。联邦学习虽可作为隐私保护替代方案,但难以应对不同医院间模态数据异构(且可能部分重叠)的挑战。为解决这一问题,我们提出联邦多模态(FedMM)学习框架,该框架通过联邦训练多个单模态特征提取器提升后续分类性能,而非现有联邦学习旨在训练统一的多模态融合模型。任何参与医院(即使仅具备小规模数据集或有限设备)均可利用这些联邦训练的特征提取器,在保障数据隐私的前提下执行本地下游任务(如分类)。通过对两个公开数据集的全面评估,我们证明FedMM在准确率和AUC指标上显著优于两种基线方法。