Multimodal federated learning (FL) is essential for real-world applications such as autonomous systems and healthcare, where data is distributed across heterogeneous clients with varying and often missing modalities. However, most existing FL approaches assume uniform modality availability, limiting their applicability in practice. We introduce BLOSSOM, a task-agnostic framework for multimodal FL designed to operate under shared and sparsely observed modality conditions. BLOSSOM supports clients with arbitrary modality subsets and enables flexible sharing of model components. To address client and task heterogeneity, we propose a block-wise aggregation strategy that selectively aggregates shared components while keeping task-specific blocks private, enabling partial personalization. We evaluate BLOSSOM on multiple diverse multimodal datasets and analyse the effects of missing modalities and personalization. Our results show that block-wise personalization significantly improves performance, particularly in settings with severe modality sparsity. In modality-incomplete scenarios, BLOSSOM achieves an average performance gain of 18.7% over full-model aggregation, while in modality-exclusive settings the gain increases to 37.7%, highlighting the importance of block-wise learning for practical multimodal FL systems.
翻译:摘要:多模态联邦学习在自动驾驶系统和医疗健康等实际应用中至关重要,这些场景中的数据分布在不同客户端之间,且各客户端持有的模态存在差异且常出现缺失。然而,现有大多数联邦学习方法假设所有客户端拥有统一的模态集,限制了其实际应用范围。我们提出BLOSSOM——一个面向共享与稀疏观测模态场景的任务无关型多模态联邦学习框架。BLOSSOM支持客户端使用任意模态子集,并实现模型组件的灵活共享。为应对客户端异构性和任务异质性,我们提出块级聚合策略:选择性聚合共享模块,同时将任务专用模块保持私有化,从而实现部分个性化。我们在多个多样化多模态数据集上评估BLOSSOM,并分析模态缺失与个性化机制的影响。实验结果表明,块级个性化显著提升性能,尤其在模态高度稀疏的场景下。在模态不完整情境中,BLOSSOM相比全模型聚合平均性能提升18.7%;而在模态独占情境下,性能增益更攀升至37.7%,这凸显了块级学习机制对实际多模态联邦学习系统的重要性。