FedMultimodal: A Benchmark For Multimodal Federated Learning

Over the past few years, Federated Learning (FL) has become an emerging machine learning technique to tackle data privacy challenges through collaborative training. In the Federated Learning algorithm, the clients submit a locally trained model, and the server aggregates these parameters until convergence. Despite significant efforts that have been made to FL in fields like computer vision, audio, and natural language processing, the FL applications utilizing multimodal data streams remain largely unexplored. It is known that multimodal learning has broad real-world applications in emotion recognition, healthcare, multimedia, and social media, while user privacy persists as a critical concern. Specifically, there are no existing FL benchmarks targeting multimodal applications or related tasks. In order to facilitate the research in multimodal FL, we introduce FedMultimodal, the first FL benchmark for multimodal learning covering five representative multimodal applications from ten commonly used datasets with a total of eight unique modalities. FedMultimodal offers a systematic FL pipeline, enabling end-to-end modeling framework ranging from data partition and feature extraction to FL benchmark algorithms and model evaluation. Unlike existing FL benchmarks, FedMultimodal provides a standardized approach to assess the robustness of FL against three common data corruptions in real-life multimodal applications: missing modalities, missing labels, and erroneous labels. We hope that FedMultimodal can accelerate numerous future research directions, including designing multimodal FL algorithms toward extreme data heterogeneity, robustness multimodal FL, and efficient multimodal FL. The datasets and benchmark results can be accessed at: https://github.com/usc-sail/fed-multimodal.

翻译：过去几年中，联邦学习（FL）作为一种新兴机器学习技术，通过协作训练解决数据隐私挑战。在联邦学习算法中，客户端提交本地训练的模型，服务器聚合这些参数直至收敛。尽管联邦学习在计算机视觉、音频和自然语言处理等领域已取得显著进展，但利用多模态数据流的FL应用仍鲜有探索。众所周知，多模态学习在情感识别、医疗健康、多媒体和社交媒体等领域具有广泛的现实应用，而用户隐私始终是关键问题。具体而言，目前尚无针对多模态应用或相关任务的现成FL基准。为促进多模态FL研究，我们提出FedMultimodal——首个面向多模态学习的FL基准，涵盖来自十个常用数据集的五种代表性多模态应用，共涉及八种独特模态。FedMultimodal提供系统化的FL流程，支持从数据划分、特征提取到FL基准算法和模型评估的全流程端到端建模框架。与现有FL基准不同，FedMultimodal提供标准化方法来评估FL在现实多模态应用中面对三种常见数据损坏的鲁棒性：模态缺失、标签缺失和标签错误。我们希望FedMultimodal能够推动多项未来研究方向，包括面向极端数据异质性的多模态FL算法设计、鲁棒多模态FL以及高效多模态FL。数据集和基准结果可访问：https://github.com/usc-sail/fed-multimodal。