Multimodal federated learning (FL) is essential for real-world applications such as autonomous systems and healthcare, where data is distributed across heterogeneous clients with varying and often missing modalities. However, most existing FL approaches assume uniform modality availability, limiting their applicability in practice. We introduce BLOSSOM, a task-agnostic framework for multimodal FL designed to operate under shared and sparsely observed modality conditions. BLOSSOM supports clients with arbitrary modality subsets and enables flexible sharing of model components. To address client and task heterogeneity, we propose a block-wise aggregation strategy that selectively aggregates shared components while keeping task-specific blocks private, enabling partial personalization. We evaluate BLOSSOM on multiple diverse multimodal datasets and analyse the effects of missing modalities and personalization. Our results show that block-wise personalization significantly improves performance, particularly in settings with severe modality sparsity. In modality-incomplete scenarios, BLOSSOM achieves an average performance gain of 18.7% over full-model aggregation, while in modality-exclusive settings the gain increases to 37.7%, highlighting the importance of block-wise learning for practical multimodal FL systems.
翻译:多模态联邦学习对于自动驾驶系统和医疗保健等实际应用至关重要,在这些场景中,数据分布在具有不同且常缺失模态的异构客户端上。然而,现有联邦学习方法大多假设模态统一可用,限制了其实用性。我们提出BLOSSOM——一种面向共享与稀疏观测模态条件的任务无关多模态联邦学习框架。BLOSSOM支持客户端拥有任意模态子集,并实现模型组件的灵活共享。为应对客户端与任务异构性,我们提出分块聚合策略:选择性聚合共享组件的同时保持任务特定块的私密性,从而实现部分个性化。我们在多个多样化多模态数据集上评估BLOSSOM,并分析模态缺失与个性化的影响。结果表明,分块个性化显著提升性能,尤其在模态严重稀疏的设定中。在模态不完整场景下,BLOSSOM相较全模型聚合平均性能提升18.7%;而在模态独占设定中,增益达37.7%,凸显了分块学习对实用多模态联邦系统的重要性。