Federated learning (FL) learns a model jointly from a set of participating devices without sharing each other's privately held data. The characteristics of non-i.i.d. data across the network, low device participation, high communication costs, and the mandate that data remain private bring challenges in understanding the convergence of FL algorithms, particularly regarding how convergence scales with the number of participating devices. In this paper, we focus on Federated Averaging (FedAvg), one of the most popular and effective FL algorithms in use today, as well as its Nesterov accelerated variant, and conduct a systematic study of how their convergence scale with the number of participating devices under non-i.i.d. data and partial participation in convex settings. We provide a unified analysis that establishes convergence guarantees for FedAvg under strongly convex, convex, and overparameterized strongly convex problems. We show that FedAvg enjoys linear speedup in each case, although with different convergence rates and communication efficiencies. For strongly convex and convex problems, we also characterize the corresponding convergence rates for the Nesterov accelerated FedAvg algorithm, which are the first linear speedup guarantees for momentum variants of FedAvg in convex settings. Empirical studies of the algorithms in various settings have supported our theoretical results.
翻译:联邦学习(FL)能够在不共享各参与设备私有数据的前提下,从一组设备中联合学习模型。网络间非独立同分布数据特性、低设备参与率、高通信成本以及数据隐私保护要求,给理解FL算法的收敛性带来了挑战,特别是收敛速度随参与设备数量的扩展规律。本文聚焦当前最常用且高效的FL算法之一——联邦平均(FedAvg)及其涅斯捷罗夫加速变体,系统研究了在非独立同分布数据及部分参与条件下,凸设置中算法收敛速度如何随参与设备数量扩展。我们提出了统一分析框架,为强凸、凸及过参数化强凸问题下FedAvg的收敛性建立了理论保证。研究表明,FedAvg在每类问题中均享有线性加速特性,但收敛速率与通信效率存在差异。针对强凸与凸问题,我们还刻画了涅斯捷罗夫加速FedAvg算法的相应收敛速率,这是首次为凸设置中动量变体FedAvg提供线性加速保证。多种设置下的实证研究验证了我们的理论结果。