FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models

One-Shot Federated Learning (OSFL), a special decentralized machine learning paradigm, has recently gained significant attention. OSFL requires only a single round of client data or model upload, which reduces communication costs and mitigates privacy threats compared to traditional FL. Despite these promising prospects, existing methods face challenges due to client data heterogeneity and limited data quantity when applied to real-world OSFL systems. Recently, Latent Diffusion Models (LDM) have shown remarkable advancements in synthesizing high-quality images through pretraining on large-scale datasets, thereby presenting a potential solution to overcome these issues. However, directly applying pretrained LDM to heterogeneous OSFL results in significant distribution shifts in synthetic data, leading to performance degradation in classification models trained on such data. This issue is particularly pronounced in rare domains, such as medical imaging, which are underrepresented in LDM's pretraining data. To address this challenge, we propose Federated Bi-Level Personalization (FedBiP), which personalizes the pretrained LDM at both instance-level and concept-level. Hereby, FedBiP synthesizes images following the client's local data distribution without compromising the privacy regulations. FedBiP is also the first approach to simultaneously address feature space heterogeneity and client data scarcity in OSFL. Our method is validated through extensive experiments on three OSFL benchmarks with feature space heterogeneity, as well as on challenging medical and satellite image datasets with label heterogeneity. The results demonstrate the effectiveness of FedBiP, which substantially outperforms other OSFL methods.

翻译：单次联邦学习作为一种特殊的去中心化机器学习范式，近年来受到广泛关注。与传统联邦学习相比，单次联邦学习仅需单轮客户端数据或模型上传，从而降低了通信成本并减轻了隐私威胁。尽管前景广阔，现有方法在实际单次联邦学习系统中仍面临客户端数据异构性和数据量有限的挑战。最近，隐扩散模型通过在大规模数据集上的预训练，在合成高质量图像方面展现出显著进展，为克服这些问题提供了潜在解决方案。然而，将预训练的隐扩散模型直接应用于异构单次联邦学习会导致合成数据的分布发生显著偏移，进而使基于此类数据训练的分类模型性能下降。这一问题在罕见领域（如医学影像）中尤为突出，因为这些领域在隐扩散模型的预训练数据中代表性不足。为应对这一挑战，我们提出了联邦双层个性化方法，该方法在实例层面和概念层面对预训练的隐扩散模型进行个性化。由此，FedBiP能够在不违反隐私规范的前提下，按照客户端本地数据分布合成图像。FedBiP也是首个同时解决单次联邦学习中特征空间异构性和客户端数据稀缺性问题的方法。我们在三个具有特征空间异构性的单次联邦学习基准数据集上，以及在具有标签异构性的医学影像和卫星影像数据集上进行了大量实验验证。结果表明，FedBiP显著优于其他单次联邦学习方法，充分证明了其有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日