Pre-training is prevalent in nowadays deep learning to improve the learned model's performance. However, in the literature on federated learning (FL), neural networks are mostly initialized with random weights. These attract our interest in conducting a systematic study to explore pre-training for FL. Across multiple visual recognition benchmarks, we found that pre-training can not only improve FL, but also close its accuracy gap to the counterpart centralized learning, especially in the challenging cases of non-IID clients' data. To make our findings applicable to situations where pre-trained models are not directly available, we explore pre-training with synthetic data or even with clients' data in a decentralized manner, and found that they can already improve FL notably. Interestingly, many of the techniques we explore are complementary to each other to further boost the performance, and we view this as a critical result toward scaling up deep FL for real-world applications. We conclude our paper with an attempt to understand the effect of pre-training on FL. We found that pre-training enables the learned global models under different clients' data conditions to converge to the same loss basin, and makes global aggregation in FL more stable. Nevertheless, pre-training seems to not alleviate local model drifting, a fundamental problem in FL under non-IID data.
翻译:预训练在当今深度学习中普遍用于提升学习模型的性能。然而,在联邦学习(FL)文献中,神经网络大多采用随机权重初始化。这促使我们进行系统性研究,探索联邦学习中的预训练方法。在多个视觉识别基准测试中,我们发现预训练不仅能提升联邦学习性能,还能缩小其与对应的集中式学习之间的精度差距,尤其是在客户端数据非独立同分布(non-IID)的挑战性场景下。为使研究成果适用于预训练模型无法直接获取的情况,我们探索了使用合成数据甚至以去中心化方式利用客户端数据进行预训练,并发现这些方法已能显著改善联邦学习效果。有趣的是,我们所探索的多种技术具有互补性,可进一步联合提升性能,我们将此视为推动深度联邦学习向真实应用规模化发展的关键成果。论文最后尝试理解预训练对联邦学习的影响机制。研究发现,预训练能使不同客户端数据条件下的全局学习模型收敛至相同损失盆地,并使联邦学习中的全局聚合更加稳定。尽管如此,预训练似乎未能缓解局部模型漂移问题——这是联邦学习在非独立同分布数据下面临的根本性挑战。