Federated averaging (FedAvg) is a widely employed paradigm for collaboratively training models from distributed clients without sharing data. Nowadays, the neural network has achieved remarkable success due to its extraordinary performance, which makes it a preferred choice as the model in FedAvg. However, the optimization problem of the neural network is often non-convex even non-smooth. Furthermore, FedAvg always involves multiple clients and local updates, which results in an inaccurate updating direction. These properties bring difficulties in analyzing the convergence of FedAvg in training neural networks. Recently, neural tangent kernel (NTK) theory has been proposed towards understanding the convergence of first-order methods in tackling the non-convex problem of neural networks. The deep linear neural network is a classical model in theoretical subject due to its simple formulation. Nevertheless, there exists no theoretical result for the convergence of FedAvg in training the deep linear neural network. By applying NTK theory, we make a further step to provide the first theoretical guarantee for the global convergence of FedAvg in training deep linear neural networks. Specifically, we prove FedAvg converges to the global minimum at a linear rate $\mathcal{O}\big((1-\eta K /N)^t\big)$, where $t$ is the number of iterations, $\eta$ is the learning rate, $N$ is the number of clients and $K$ is the number of local updates. Finally, experimental evaluations on two benchmark datasets are conducted to empirically validate the correctness of our theoretical findings.
翻译:联邦平均(FedAvg)是一种广泛使用的范式,用于在分布式客户端之间协同训练模型而无需共享数据。如今,神经网络凭借其卓越性能取得了显著成功,这使其成为FedAvg中模型的首选。然而,神经网络的优化问题通常是非凸甚至非光滑的。此外,FedAvg总是涉及多个客户端和本地更新,这可能导致更新方向不准确。这些特性为分析FedAvg训练神经网络时的收敛性带来了困难。近年来,神经正切核(NTK)理论被提出,用于理解一阶方法解决神经网络非凸问题时的收敛性。深度线性神经网络因其简洁的公式而成为理论课题中的经典模型。然而,目前尚未有关于FedAvg训练深度线性神经网络收敛性的理论结果。通过应用NTK理论,我们进一步迈出一步,首次为FedAvg训练深度线性神经网络的全局收敛性提供了理论保证。具体来说,我们证明了FedAvg以线性速率$\mathcal{O}\big((1-\eta K /N)^t\big)$收敛到全局最小值,其中$t$为迭代次数,$\eta$为学习率,$N$为客户端数量,$K$为本地更新次数。最后,在两个基准数据集上进行了实验评估,以实证验证我们理论发现的正确性。