Federated learning (FL) enables collaborative model training across decentralized medical institutions while preserving data privacy. However, medical FL benchmarks remain scarce, with existing efforts focusing mainly on unimodal or bimodal modalities and a limited range of medical tasks. This gap underscores the need for standardized evaluation to advance systematic understanding in medical MultiModal FL (MMFL). To this end, we introduce Med-MMFL, the first comprehensive MMFL benchmark for the medical domain, encompassing diverse modalities, tasks, and federation scenarios. Our benchmark evaluates six representative state-of-the-art FL algorithms, covering different aggregation strategies, loss formulations, and regularization techniques. It spans datasets with 2 to 4 modalities, comprising a total of 10 unique medical modalities, including text, pathology images, ECG, X-ray, radiology reports, and multiple MRI sequences. Experiments are conducted across naturally federated, synthetic IID, and synthetic non-IID settings to simulate real-world heterogeneity. We assess segmentation, classification, modality alignment (retrieval), and VQA tasks. To support reproducibility and fair comparison of future multimodal federated learning (MMFL) methods under realistic medical settings, we release the complete benchmark implementation, including data processing and partitioning pipelines, at https://github.com/bhattarailab/Med-MMFL-Benchmark .
翻译:联邦学习(FL)能够实现跨分散医疗机构的协作模型训练,同时保护数据隐私。然而,医疗FL基准测试仍然稀缺,现有工作主要集中于单模态或双模态以及有限的医疗任务范围。这一差距凸显了标准化评估的必要性,以推进医疗多模态联邦学习(MMFL)的系统性理解。为此,我们推出了Med-MMFL,这是首个面向医疗领域的综合性MMFL基准测试,涵盖多种模态、任务和联邦场景。我们的基准测试评估了六种具有代表性的最先进FL算法,涵盖不同的聚合策略、损失函数公式和正则化技术。它覆盖了包含2至4种模态的数据集,总计包含10种独特的医疗模态,包括文本、病理图像、心电图、X射线、放射学报告以及多种MRI序列。实验在自然联邦、合成独立同分布(IID)和合成非独立同分布(non-IID)设置下进行,以模拟现实世界的异构性。我们评估了分割、分类、模态对齐(检索)和视觉问答(VQA)任务。为了支持未来多模态联邦学习方法在现实医疗设置下的可复现性和公平比较,我们发布了完整的基准测试实现,包括数据处理和分区流程,地址为 https://github.com/bhattarailab/Med-MMFL-Benchmark 。