FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models

One-Shot Federated Learning (OSFL), a special decentralized machine learning paradigm, has recently gained significant attention. OSFL requires only a single round of client data or model upload, which reduces communication costs and mitigates privacy threats compared to traditional FL. Despite these promising prospects, existing methods face challenges due to client data heterogeneity and limited data quantity when applied to real-world OSFL systems. Recently, Latent Diffusion Models (LDM) have shown remarkable advancements in synthesizing high-quality images through pretraining on large-scale datasets, thereby presenting a potential solution to overcome these issues. However, directly applying pretrained LDM to heterogeneous OSFL results in significant distribution shifts in synthetic data, leading to performance degradation in classification models trained on such data. This issue is particularly pronounced in rare domains, such as medical imaging, which are underrepresented in LDM's pretraining data. To address this challenge, we propose Federated Bi-Level Personalization (FedBiP), which personalizes the pretrained LDM at both instance-level and concept-level. Hereby, FedBiP synthesizes images following the client's local data distribution without compromising the privacy regulations. FedBiP is also the first approach to simultaneously address feature space heterogeneity and client data scarcity in OSFL. Our method is validated through extensive experiments on three OSFL benchmarks with feature space heterogeneity, as well as on challenging medical and satellite image datasets with label heterogeneity. The results demonstrate the effectiveness of FedBiP, which substantially outperforms other OSFL methods.

翻译：单次联邦学习作为一种特殊的去中心化机器学习范式，近期受到广泛关注。与传统联邦学习相比，OSFL仅需单轮客户端数据或模型上传，显著降低了通信成本并缓解了隐私威胁。尽管前景广阔，现有方法在实际OSFL系统中仍面临客户端数据异构性与数据量有限的双重挑战。近年来，潜在扩散模型通过在大规模数据集上的预训练，在合成高质量图像方面取得显著进展，为克服上述问题提供了潜在解决方案。然而，将预训练的LDM直接应用于异构OSFL会导致合成数据出现显著分布偏移，进而使基于此类数据训练的分类模型性能下降。该问题在医学影像等稀有领域中尤为突出，这些领域在LDM预训练数据中代表性不足。为应对这一挑战，我们提出联邦双层个性化方法，该方法在实例级和概念级对预训练LDM进行个性化定制，从而在不违反隐私规范的前提下合成符合客户端本地数据分布的图像。FedBiP也是首个同时解决OSFL中特征空间异构性与客户端数据稀缺性问题的方法。我们在三个具有特征空间异构性的OSFL基准数据集，以及具有标签异构性的医学影像和卫星图像数据集上进行了大量实验验证。结果表明FedBiP显著优于现有OSFL方法，充分证明了其有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日