One-shot federated learning (OSFL) has gained attention in recent years due to its low communication cost. However, most of the existing methods require auxiliary datasets or training generators, which hinders their practicality in real-world scenarios. In this paper, we explore the novel opportunities that diffusion models bring to OSFL and propose FedCADO, utilizing guidance from client classifiers to generate data that complies with clients' distributions and subsequently training the aggregated model on the server. Specifically, our method involves targeted optimizations in two aspects. On one hand, we conditionally edit the randomly sampled initial noises, embedding them with specified semantics and distributions, resulting in a significant improvement in both the quality and stability of generation. On the other hand, we employ the BN statistics from the classifiers to provide detailed guidance during generation. These tailored optimizations enable us to limitlessly generate datasets, which closely resemble the distribution and quality of the original client dataset. Our method effectively handles the heterogeneous client models and the problems of non-IID features or labels. In terms of privacy protection, our method avoids training any generator or transferring any auxiliary information on clients, eliminating any additional privacy leakage risks. Leveraging the extensive knowledge stored in the pre-trained diffusion model, the synthetic datasets can assist us in surpassing the knowledge limitations of the client samples, resulting in aggregation models that even outperform the performance ceiling of centralized training in some cases, which is convincingly demonstrated in the sufficient quantification and visualization experiments conducted on three large-scale multi-domain image datasets.
翻译:一次性联邦学习(OSFL)因通信成本低而近年来备受关注。然而,现有方法大多需要辅助数据集或训练生成器,这限制了其在真实场景中的实用性。本文探索扩散模型为OSFL带来的全新机遇,提出FedCADO方法,利用客户端分类器的引导生成符合客户端分布的数据,并在服务器端训练聚合模型。具体而言,我们在两个方面进行定向优化:一方面,对随机采样的初始噪声进行条件化编辑,嵌入特定语义和分布,显著提升生成质量与稳定性;另一方面,利用分类器的批归一化(BN)统计量在生成过程中提供精细引导。这些定制优化使我们能够无限生成与原始客户端数据集分布和质量高度相似的数据集。该方法能有效处理异构客户端模型以及非独立同分布(Non-IID)特征或标签的问题。在隐私保护方面,我们的方法无需在客户端训练任何生成器或传输任何辅助信息,消除了额外的隐私泄露风险。借助预训练扩散模型中存储的广泛知识,合成数据集可助我们突破客户端样本的知识局限,在某些情况下甚至能聚合出超越集中训练性能天花板的结果——这一结论在三个大规模多域图像数据集上进行的充分量化与可视化实验中得到有力证明。