One-Shot Federated Learning (OSFL), a special decentralized machine learning paradigm, has recently gained significant attention. OSFL requires only a single round of client data or model upload, which reduces communication costs and mitigates privacy threats compared to traditional FL. Despite these promising prospects, existing methods face challenges due to client data heterogeneity and limited data quantity when applied to real-world OSFL systems. Recently, Latent Diffusion Models (LDM) have shown remarkable advancements in synthesizing high-quality images through pretraining on large-scale datasets, thereby presenting a potential solution to overcome these issues. However, directly applying pretrained LDM to heterogeneous OSFL results in significant distribution shifts in synthetic data, leading to performance degradation in classification models trained on such data. This issue is particularly pronounced in rare domains, such as medical imaging, which are underrepresented in LDM's pretraining data. To address this challenge, we propose Federated Bi-Level Personalization (FedBiP), which personalizes the pretrained LDM at both instance-level and concept-level. Hereby, FedBiP synthesizes images following the client's local data distribution without compromising the privacy regulations. FedBiP is also the first approach to simultaneously address feature space heterogeneity and client data scarcity in OSFL. Our method is validated through extensive experiments on three OSFL benchmarks with feature space heterogeneity, as well as on challenging medical and satellite image datasets with label heterogeneity. The results demonstrate the effectiveness of FedBiP, which substantially outperforms other OSFL methods.
翻译:单次联邦学习作为一种特殊的去中心化机器学习范式,近期受到广泛关注。与传统联邦学习相比,OSFL仅需单轮客户端数据或模型上传,显著降低了通信成本并缓解了隐私威胁。尽管前景广阔,现有方法在实际OSFL系统中仍面临客户端数据异构性与数据量有限的双重挑战。近年来,潜在扩散模型通过在大规模数据集上的预训练,在合成高质量图像方面取得显著进展,为克服上述问题提供了潜在解决方案。然而,将预训练的LDM直接应用于异构OSFL会导致合成数据出现显著分布偏移,进而使基于此类数据训练的分类模型性能下降。该问题在医学影像等稀有领域中尤为突出,这些领域在LDM预训练数据中代表性不足。为应对这一挑战,我们提出联邦双层个性化方法,该方法在实例级和概念级对预训练LDM进行个性化定制,从而在不违反隐私规范的前提下合成符合客户端本地数据分布的图像。FedBiP也是首个同时解决OSFL中特征空间异构性与客户端数据稀缺性问题的方法。我们在三个具有特征空间异构性的OSFL基准数据集,以及具有标签异构性的医学影像和卫星图像数据集上进行了大量实验验证。结果表明FedBiP显著优于现有OSFL方法,充分证明了其有效性。