FedMEKT: Distillation-based Embedding Knowledge Transfer for Multimodal Federated Learning

Federated learning (FL) enables a decentralized machine learning paradigm for multiple clients to collaboratively train a generalized global model without sharing their private data. Most existing works simply propose typical FL systems for single-modal data, thus limiting its potential on exploiting valuable multimodal data for future personalized applications. Furthermore, the majority of FL approaches still rely on the labeled data at the client side, which is limited in real-world applications due to the inability of self-annotation from users. In light of these limitations, we propose a novel multimodal FL framework that employs a semi-supervised learning approach to leverage the representations from different modalities. Bringing this concept into a system, we develop a distillation-based multimodal embedding knowledge transfer mechanism, namely FedMEKT, which allows the server and clients to exchange the joint knowledge of their learning models extracted from a small multimodal proxy dataset. Our FedMEKT iteratively updates the generalized global encoders with the joint embedding knowledge from the participating clients. Thereby, to address the modality discrepancy and labeled data constraint in existing FL systems, our proposed FedMEKT comprises local multimodal autoencoder learning, generalized multimodal autoencoder construction, and generalized classifier learning. Through extensive experiments on three multimodal human activity recognition datasets, we demonstrate that FedMEKT achieves superior global encoder performance on linear evaluation and guarantees user privacy for personal data and model parameters while demanding less communication cost than other baselines.

翻译：联邦学习（FL）支持一种去中心化机器学习范式，使多个客户端能够在不共享私有数据的情况下协作训练通用全局模型。现有工作大多针对单模态数据提出典型FL系统，从而限制了其在未来个性化应用中挖掘宝贵多模态数据的潜力。此外，多数FL方法仍依赖客户端侧标注数据，由于用户无法自行标注，这在现实应用中存在局限性。针对这些局限，我们提出一种新型多模态FL框架，采用半监督学习方法利用不同模态的表征。将这一理念转化为系统后，我们开发了一种基于蒸馏的多模态嵌入知识迁移机制（即FedMEKT），使服务器与客户端能够交换从小型多模态代理数据集中提取到的学习模型联合知识。FedMEKT通过参与客户端的联合嵌入知识迭代更新通用全局编码器。为了解决现有FL系统中的模态差异与标注数据约束问题，所提出的FedMEKT包含局部多模态自编码器学习、广义多模态自编码器构建以及广义分类器学习三个模块。通过在三个多模态人体活动识别数据集上的大量实验，我们证明FedMEKT在线性评估中实现了优越的全局编码器性能，在保障用户个人数据与模型参数隐私的同时，相较于其他基线方法降低了通信开销。