One-Shot Federated Learning (OSFL), a special decentralized machine learning paradigm, has recently gained significant attention. OSFL requires only a single round of client data or model upload, which reduces communication costs and mitigates privacy threats compared to traditional FL. Despite these promising prospects, existing methods face challenges due to client data heterogeneity and limited data quantity when applied to real-world OSFL systems. Recently, Latent Diffusion Models (LDM) have shown remarkable advancements in synthesizing high-quality images through pretraining on large-scale datasets, thereby presenting a potential solution to overcome these issues. However, directly applying pretrained LDM to heterogeneous OSFL results in significant distribution shifts in synthetic data, leading to performance degradation in classification models trained on such data. This issue is particularly pronounced in rare domains, such as medical imaging, which are underrepresented in LDM's pretraining data. To address this challenge, we propose Federated Bi-Level Personalization (FedBiP), which personalizes the pretrained LDM at both instance-level and concept-level. Hereby, FedBiP synthesizes images following the client's local data distribution without compromising the privacy regulations. FedBiP is also the first approach to simultaneously address feature space heterogeneity and client data scarcity in OSFL. Our method is validated through extensive experiments on three OSFL benchmarks with feature space heterogeneity, as well as on challenging medical and satellite image datasets with label heterogeneity. The results demonstrate the effectiveness of FedBiP, which substantially outperforms other OSFL methods.
翻译:单次联邦学习作为一种特殊的去中心化机器学习范式,近年来受到广泛关注。与传统联邦学习相比,单次联邦学习仅需单轮客户端数据或模型上传,从而降低了通信成本并减轻了隐私威胁。尽管前景广阔,现有方法在实际单次联邦学习系统中仍面临客户端数据异构性和数据量有限的挑战。最近,隐扩散模型通过在大规模数据集上的预训练,在合成高质量图像方面展现出显著进展,为克服这些问题提供了潜在解决方案。然而,将预训练的隐扩散模型直接应用于异构单次联邦学习会导致合成数据的分布发生显著偏移,进而使基于此类数据训练的分类模型性能下降。这一问题在罕见领域(如医学影像)中尤为突出,因为这些领域在隐扩散模型的预训练数据中代表性不足。为应对这一挑战,我们提出了联邦双层个性化方法,该方法在实例层面和概念层面对预训练的隐扩散模型进行个性化。由此,FedBiP能够在不违反隐私规范的前提下,按照客户端本地数据分布合成图像。FedBiP也是首个同时解决单次联邦学习中特征空间异构性和客户端数据稀缺性问题的方法。我们在三个具有特征空间异构性的单次联邦学习基准数据集上,以及在具有标签异构性的医学影像和卫星影像数据集上进行了大量实验验证。结果表明,FedBiP显著优于其他单次联邦学习方法,充分证明了其有效性。