Federated learning (FL) is a widely employed distributed paradigm for collaboratively training machine learning models from multiple clients without sharing local data. In practice, FL encounters challenges in dealing with partial client participation due to the limited bandwidth, intermittent connection and strict synchronized delay. Simultaneously, there exist few theoretical convergence guarantees in this practical setting, especially when associated with the non-convex optimization of neural networks. To bridge this gap, we focus on the training problem of federated averaging (FedAvg) method for two canonical models: a deep linear network and a two-layer ReLU network. Under the over-parameterized assumption, we provably show that FedAvg converges to a global minimum at a linear rate $\mathcal{O}\left((1-\frac{min_{i \in [t]}|S_i|}{N^2})^t\right)$ after $t$ iterations, where $N$ is the number of clients and $|S_i|$ is the number of the participated clients in the $i$-th iteration. Experimental evaluations confirm our theoretical results.
翻译:联邦学习(FL)是一种广泛采用的分布式范式,用于在多个客户端之间协作训练机器学习模型,而无需共享本地数据。在实际应用中,由于带宽有限、连接间断以及严格的同步延迟,联邦学习在处理部分客户端参与时面临挑战。同时,在这种实际设置下,尤其是在与非凸神经网络优化相关时,几乎没有理论收敛性保证。为弥补这一空白,我们聚焦于两种典型模型的联邦平均(FedAvg)方法训练问题:深度线性网络和两层ReLU网络。在过参数化假设下,我们可证明地表明,经过$t$次迭代后,FedAvg以线性速率$\mathcal{O}\left((1-\frac{\min_{i \in [t]}|S_i|}{N^2})^t\right)$收敛到全局最小值,其中$N$是客户端总数,$|S_i|$是第$i$次迭代中参与客户端的数量。实验评估验证了我们的理论结果。