In this work, we study empirical risk minimization (ERM) within a federated learning framework, where a central server minimizes an ERM objective function using training data that is stored across $m$ clients. In this setting, the Federated Averaging (FedAve) algorithm is the staple for determining $\epsilon$-approximate solutions to the ERM problem. Similar to standard optimization algorithms, the convergence analysis of FedAve only relies on smoothness of the loss function in the optimization parameter. However, loss functions are often very smooth in the training data too. To exploit this additional smoothness, we propose the Federated Low Rank Gradient Descent (FedLRGD) algorithm. Since smoothness in data induces an approximate low rank structure on the loss function, our method first performs a few rounds of communication between the server and clients to learn weights that the server can use to approximate clients' gradients. Then, our method solves the ERM problem at the server using inexact gradient descent. To show that FedLRGD can have superior performance to FedAve, we present a notion of federated oracle complexity as a counterpart to canonical oracle complexity. Under some assumptions on the loss function, e.g., strong convexity in parameter, $\eta$-H\"older smoothness in data, etc., we prove that the federated oracle complexity of FedLRGD scales like $\phi m(p/\epsilon)^{\Theta(d/\eta)}$ and that of FedAve scales like $\phi m(p/\epsilon)^{3/4}$ (neglecting sub-dominant factors), where $\phi\gg 1$ is a "communication-to-computation ratio," $p$ is the parameter dimension, and $d$ is the data dimension. Then, we show that when $d$ is small and the loss function is sufficiently smooth in the data, FedLRGD beats FedAve in federated oracle complexity. Finally, in the course of analyzing FedLRGD, we also establish a result on low rank approximation of latent variable models.
翻译:本文研究联邦学习框架下的经验风险最小化问题,其中中央服务器利用分布在$m$个客户端上的训练数据最小化经验风险目标函数。在此设定下,联邦平均算法是求解经验风险问题$\epsilon$-近似解的标准方法。与标准优化算法类似,FedAve的收敛性分析仅依赖于损失函数在优化参数上的光滑性。然而,损失函数在训练数据上往往也具有高度光滑性。为利用这一额外的光滑性,我们提出联邦低秩梯度下降算法。由于数据光滑性会诱导损失函数的近似低秩结构,该方法首先在服务器与客户端之间进行数轮通信,使服务器学习可用于近似客户端梯度的权重,继而通过非精确梯度下降在服务器端求解经验风险问题。为证明FedLRGD可比FedAve具有更优性能,我们提出联邦预言复杂度概念作为规范预言复杂度的对应物。在损失函数满足参数强凸性、数据$\eta$-Hölder光滑性等假设下,我们证明FedLRGD的联邦预言复杂度为$\phi m(p/\epsilon)^{\Theta(d/\eta)}$,而FedAve的复杂度为$\phi m(p/\epsilon)^{3/4}$(忽略次主导因子),其中$\phi\gg 1$表示通信计算比,$p$为参数维度,$d$为数据维度。进而证明当$d$较小且损失函数在数据上充分光滑时,FedLRGD在联邦预言复杂度上优于FedAve。最后,在分析FedLRGD的过程中,我们还建立了关于潜变量模型低秩近似的新结论。