Multimodal Federated Learning (MMFL) enables privacy-preserving collaborative learning across decentralized clients with heterogeneous data and modality availability. However, most existing MMFL methods cast multimodal training as a joint optimization problem, overlooking a key bottleneck: modality competition, where dominant modalities suppress weaker ones and lead to suboptimal global models. To address this, we propose FedMChain, a balanced MMFL framework that structures federated multimodal training as a chain of modality-wise phases. This phase-wise design gives each modality a dedicated local optimization window on multimodal clients to mitigate modality competition, and further promotes cross-modal complementarity via an error-compensated regularizer. On the server side, we employ a sparse sign-guided aggregation strategy that leverages directional sign agreement for robust intra-modality aggregation, avoids destructive averaging, and supports less frequent synchronization to reduce communication overhead. Extensive experiments on multimodal benchmarks demonstrate that FedMChain consistently improves predictive performance while requiring less frequent communication than baselines.
翻译:多模态联邦学习(MMFL)能够在保护隐私的前提下,实现跨异构数据与模态可用性的分散客户端协同学习。然而,现有大多数MMFL方法将多模态训练视为联合优化问题,忽略了关键瓶颈:模态竞争——主导模态抑制较弱模态,导致全局模型次优。为解决此问题,我们提出FedMChain,一种平衡的MMFL框架,将联邦多模态训练结构化为模态链式阶段。这种分阶段设计为每个模态在多模态客户端上提供专用的局部优化窗口,以缓解模态竞争,并通过误差补偿正则化器进一步促进跨模态互补性。在服务器端,我们采用稀疏符号引导聚合策略,利用方向符号一致性实现鲁棒的模态内聚合,避免破坏性平均,并支持较少频率的同步以降低通信开销。在多模态基准上的大量实验表明,FedMChain在需要比基线更少通信频率的同时,持续提升了预测性能。