Communication-Efficient Multimodal Federated Learning: Joint Modality and Client Selection

Multimodal federated learning (MFL) aims to enrich model training in FL settings where clients are collecting measurements across multiple modalities. However, key challenges to MFL remain unaddressed, particularly in heterogeneous network settings where: (i) the set of modalities collected by each client is diverse, and (ii) communication limitations prevent clients from uploading all their locally trained modality encoders to the server. In this paper, we propose Multimodal Federated learning with joint Modality and Client selection (MFedMC), a communication-efficient MFL framework that tackles these challenges through a decoupled architecture and selective uploading. Unlike traditional holistic fusion approaches, MFedMC separates modality encoders and fusion modules: modality encoders are aggregated at the server for generalization across diverse client distributions, while fusion modules remain local to each client for personalized adaptation to individual modality configurations and data characteristics. Building on this decoupled design, our joint selection algorithm incorporates two main components: (a) A modality selection methodology for each client, which weighs (i) the impact of the modality, gauged by Shapley value analysis, (ii) the modality encoder size as a gauge of communication overhead, and (iii) the frequency of modality encoder updates, denoted recency, to enhance generalizability. (b) A client selection strategy for the server based on the local loss of modality encoders at each client. Experiments on five real-world datasets demonstrate that MFedMC achieves comparable accuracy to several baselines while reducing communication overhead by over 20$\times$. A demo video and our code are available at https://liangqiy.com/mfedmc/.

翻译：多模态联邦学习旨在丰富联邦学习场景下的模型训练，其中客户端收集跨多种模态的测量数据。然而，多模态联邦学习仍面临未解决的关键挑战，尤其是在异构网络环境中：（i）每个客户端收集的模态集合各不相同，且（ii）通信限制使得客户端无法将所有本地训练好的模态编码器上传至服务器。本文提出联合模态与客户端选择的多模态联邦学习框架，这是一种通信高效的多模态联邦学习框架，通过解耦架构和选择性上传应对上述挑战。与传统整体融合方法不同，该框架将模态编码器与融合模块分离：模态编码器在服务器端聚合，以实现跨不同客户端分布的泛化；而融合模块则保留在每个客户端本地，以个性化适应各自的模态配置与数据特征。基于此解耦设计，我们的联合选择算法包含两个主要部分：（a）针对每个客户端的模态选择方法，该方法权衡（i）模态的影响（通过沙普利值分析评估）、（ii）模态编码器大小（作为通信开销的衡量指标）以及（iii）模态编码器更新频率（称为时效性），以提升泛化能力。（b）服务器端基于各客户端本地模态编码器损失的客户端选择策略。在五个真实数据集上的实验表明，该框架在实现与多个基线方法相当精度的同时，将通信开销降低了超过20倍。演示视频与代码可在https://liangqiy.com/mfedmc/获取。