Multimodal federated learning (FL) aims to enrich model training in FL settings where devices are collecting measurements across multiple modalities (e.g., sensors measuring pressure, motion, and other types of data). However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings: (i) the set of modalities collected by each device will be diverse, and (ii) communication limitations prevent devices from uploading all their locally trained modality models to the server. In this paper, we propose Federated Multimodal Fusion learning with Selective modality communication (FedMFS), a new multimodal fusion FL methodology that can tackle the above mentioned challenges. The key idea is the introduction of a modality selection criterion for each device, which weighs (i) the impact of the modality, gauged by Shapley value analysis, against (ii) the modality model size as a gauge for communication overhead. This enables FedMFS to flexibly balance performance against communication costs, depending on resource constraints and application requirements. Experiments on the real-world ActionSense dataset demonstrate the ability of FedMFS to achieve comparable accuracy to several baselines while reducing the communication overhead by over 4x.
翻译:多模态联邦学习旨在在设备收集多种模态测量数据(如压力、运动及其他类型的传感器数据)的联邦学习环境中丰富模型训练。然而,多模态联邦学习中的关键挑战仍未解决,尤其在异构网络设置中:(i)各设备所收集的模态集合具有多样性,(ii)通信限制阻止设备将所有本地训练的模态模型上传至服务器。本文提出一种新的多模态融合联邦学习方法——面向选择性模态通信的联邦多模态融合学习(FedMFS),该方法能够应对上述挑战。其核心思想是为每个设备引入模态选择准则,该准则权衡(i)通过Shapley值分析评估的模态影响,与(ii)作为通信开销衡量指标的模态模型大小。这使得FedMFS能够根据资源约束和应用需求灵活地在性能与通信开销之间取得平衡。在真实世界ActionSense数据集上的实验表明,FedMFS在实现与多个基线方法相当精度的同时,将通信开销降低逾4倍。