On Principled Local Optimization Methods for Federated Learning

Federated Learning (FL), a distributed learning paradigm that scales on-device learning collaboratively, has emerged as a promising approach for decentralized AI applications. Local optimization methods such as Federated Averaging (FedAvg) are the most prominent methods for FL applications. Despite their simplicity and popularity, the theoretical understanding of local optimization methods is far from clear. This dissertation aims to advance the theoretical foundation of local methods in the following three directions. First, we establish sharp bounds for FedAvg, the most popular algorithm in Federated Learning. We demonstrate how FedAvg may suffer from a notion we call iterate bias, and how an additional third-order smoothness assumption may mitigate this effect and lead to better convergence rates. We explain this phenomenon from a Stochastic Differential Equation (SDE) perspective. Second, we propose Federated Accelerated Stochastic Gradient Descent (FedAc), the first principled acceleration of FedAvg, which provably improves the convergence rate and communication efficiency. Our technique uses on a potential-based perturbed iterate analysis, a novel stability analysis of generalized accelerated SGD, and a strategic tradeoff between acceleration and stability. Third, we study the Federated Composite Optimization problem, which extends the classic smooth setting by incorporating a shared non-smooth regularizer. We show that direct extensions of FedAvg may suffer from the "curse of primal averaging," resulting in slow convergence. As a solution, we propose a new primal-dual algorithm, Federated Dual Averaging, which overcomes the curse of primal averaging by employing a novel inter-client dual averaging procedure.

翻译：联邦学习（FL）是一种在设备端协同进行可扩展学习的分布式学习范式，已成为去中心化人工智能应用的一种有前途的方法。局部优化方法（如联邦平均算法 FedAvg）是联邦学习应用中最突出的方法。尽管这些方法简单且广受欢迎，但其理论理解远未清晰。本论文旨在从以下三个方向推进局部方法的理论基础。首先，我们为联邦学习中最流行的算法 FedAvg 建立了尖锐的界限。我们展示了 FedAvg 如何可能受到我们称之为“迭代偏差”的影响，以及额外的三阶光滑性假设如何减轻这种影响并带来更好的收敛速率。我们从随机微分方程（SDE）的角度解释了这一现象。其次，我们提出了联邦加速随机梯度下降（FedAc）算法，这是 FedAvg 的第一种原则性加速方法，可显著提高收敛速率和通信效率。我们的技术使用了基于势能的扰动迭代分析、广义加速 SGD 的稳定性分析，以及在加速与稳定性之间的战略性权衡。第三，我们研究了联邦复合优化问题，该问题通过引入共享的非光滑正则化器扩展了经典的光滑设定。我们表明，FedAvg 的直接扩展可能遭受“原始平均的诅咒”，导致收敛缓慢。作为解决方案，我们提出了一种新的原始-对偶算法——联邦对偶平均法，通过采用新颖的客户端间对偶平均过程克服了原始平均的诅咒。