pFedMoE: Data-Level Personalization with Mixture of Experts for Model-Heterogeneous Personalized Federated Learning

Federated learning (FL) has been widely adopted for collaborative training on decentralized data. However, it faces the challenges of data, system, and model heterogeneity. This has inspired the emergence of model-heterogeneous personalized federated learning (MHPFL). Nevertheless, the problem of ensuring data and model privacy, while achieving good model performance and keeping communication and computation costs low remains open in MHPFL. To address this problem, we propose a model-heterogeneous personalized Federated learning with Mixture of Experts (pFedMoE) method. It assigns a shared homogeneous small feature extractor and a local gating network for each client's local heterogeneous large model. Firstly, during local training, the local heterogeneous model's feature extractor acts as a local expert for personalized feature (representation) extraction, while the shared homogeneous small feature extractor serves as a global expert for generalized feature extraction. The local gating network produces personalized weights for extracted representations from both experts on each data sample. The three models form a local heterogeneous MoE. The weighted mixed representation fuses generalized and personalized features and is processed by the local heterogeneous large model's header with personalized prediction information. The MoE and prediction header are updated simultaneously. Secondly, the trained local homogeneous small feature extractors are sent to the server for cross-client information fusion via aggregation. Overall, pFedMoE enhances local model personalization at a fine-grained data level, while supporting model heterogeneity.

翻译：联邦学习（FL）已被广泛应用于分散数据的协同训练，然而数据异构、系统异构和模型异构问题依然存在，这催生了模型异构个性化联邦学习（MHPFL）的发展。但如何在保证数据与模型隐私的前提下，同时实现良好的模型性能并保持较低的通信与计算成本，仍是MHPFL中尚未解决的关键问题。为此，本文提出一种基于混合专家模型的模型异构个性化联邦学习方法（pFedMoE）。该方法为每个客户端本地异构大模型配备一个共享的同构小特征提取器和一个本地门控网络。首先，在本地训练过程中，本地异构模型的特征提取器作为本地专家进行个性化特征提取，而共享的同构小特征提取器则作为全局专家进行通用特征提取。本地门控网络针对每个数据样本产生两个专家提取特征的个性化权重，三者构成本地异构混合专家模型（MoE）。加权混合后的特征融合了通用与个性化信息，并由包含个性化预测信息的本地异构大模型分类头进行处理，MoE与预测头同步更新。其次，训练完成的本地同构小特征提取器被传输至服务器，通过聚合实现跨客户端信息融合。总体而言，pFedMoE在细粒度数据层面增强本地模型个性化能力，同时支持模型异构性。