Federated learning is a paradigm of distributed machine learning in which multiple clients coordinate with a central server to learn a model, without sharing their own training data. Standard federated optimization methods such as Federated Averaging (FedAvg) ensure balance among the clients by using the same stepsize for local updates on all clients. However, this means that all clients need to respect the global geometry of the function which could yield slow convergence. In this work, we propose locally adaptive federated learning algorithms, that leverage the local geometric information for each client function. We show that such locally adaptive methods with uncoordinated stepsizes across all clients can be particularly efficient in interpolated (overparameterized) settings, and analyze their convergence in the presence of heterogeneous data for convex and strongly convex settings. We validate our theoretical claims by performing illustrative experiments for both i.i.d. non-i.i.d. cases. Our proposed algorithms match the optimization performance of tuned FedAvg in the convex setting, outperform FedAvg as well as state-of-the-art adaptive federated algorithms like FedAMS for non-convex experiments, and come with superior generalization performance.
翻译:联邦学习是一种分布式机器学习范式,多个客户端与中央服务器协调学习模型,而无需共享各自的训练数据。标准的联邦优化方法(如联邦平均法(FedAvg))通过在所有客户端上对局部更新采用相同步长来确保客户端间的平衡。然而,这意味着所有客户端必须服从函数的全局几何结构,可能导致收敛缓慢。本文提出局部自适应联邦学习算法,该算法利用每个客户端函数的局部几何信息。我们证明,在插值(过参数化)设置下,这种跨客户端步长不协调的局部自适应方法尤为高效,并在凸和强凸设置下分析了其在异构数据存在时的收敛性。通过针对独立同分布和非独立同分布情况的示例实验,我们验证了理论主张。所提出的算法在凸设置下与调优后的FedAvg优化性能相当;在非凸实验中,其性能优于FedAvg及最先进的自适应联邦算法(如FedAMS),并具有更优的泛化性能。