An oft-cited challenge of federated learning is the presence of heterogeneity. \emph{Data heterogeneity} refers to the fact that data from different clients may follow very different distributions. \emph{System heterogeneity} refers to client devices having different system capabilities. A considerable number of federated optimization methods address this challenge. In the literature, empirical evaluations usually start federated training from random initialization. However, in many practical applications of federated learning, the server has access to proxy data for the training task that can be used to pre-train a model before starting federated training. Using four standard federated learning benchmark datasets, we empirically study the impact of starting from a pre-trained model in federated learning. Unsurprisingly, starting from a pre-trained model reduces the training time required to reach a target error rate and enables the training of more accurate models (up to 40\%) than is possible when starting from random initialization. Surprisingly, we also find that starting federated learning from a pre-trained initialization reduces the effect of both data and system heterogeneity. We recommend future work proposing and evaluating federated optimization methods to evaluate the performance when starting from random and pre-trained initializations. This study raises several questions for further work on understanding the role of heterogeneity in federated optimization.
翻译:联邦学习的一个常被提及的挑战是异质性的存在。数据异质性指不同客户端的数据可能遵循截然不同的分布,系统异质性则指客户端设备具有不同的系统能力。大量联邦优化方法致力于解决这一挑战。在现有文献中,实证评估通常从随机初始化开始联邦训练。然而,在许多联邦学习的实际应用中,服务器可以访问训练任务的代理数据,这些数据可用于在开始联邦训练之前预训练模型。通过使用四个标准联邦学习基准数据集,我们实证研究了联邦学习中从预训练模型开始的影响。不出所料,从预训练模型开始能够减少达到目标错误率所需的训练时间,并能够训练出比随机初始化更精确的模型(提升高达40%)。令人惊讶的是,我们还发现从预训练初始化开始联邦学习能够降低数据异质性和系统异质性的影响。我们建议未来的研究在提出和评估联邦优化方法时,应分别评估从随机初始化和预训练初始化开始时的性能。这项研究为深入理解异质性在联邦优化中的作用提出了若干待进一步探究的问题。