A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning

In this paper, we study a large-scale multi-agent minimax optimization problem, which models many interesting applications in statistical learning and game theory, including Generative Adversarial Networks (GANs). The overall objective is a sum of agents' private local objective functions. We first analyze an important special case, empirical minimax problem, where the overall objective approximates a true population minimax risk by statistical samples. We provide generalization bounds for learning with this objective through Rademacher complexity analysis. Then, we focus on the federated setting, where agents can perform local computation and communicate with a central server. Most existing federated minimax algorithms either require communication per iteration or lack performance guarantees with the exception of Local Stochastic Gradient Descent Ascent (SGDA), a multiple-local-update descent ascent algorithm which guarantees convergence under a diminishing stepsize. By analyzing Local SGDA under the ideal condition of no gradient noise, we show that generally it cannot guarantee exact convergence with constant stepsizes and thus suffers from slow rates of convergence. To tackle this issue, we propose FedGDA-GT, an improved Federated (Fed) Gradient Descent Ascent (GDA) method based on Gradient Tracking (GT). When local objectives are Lipschitz smooth and strongly-convex-strongly-concave, we prove that FedGDA-GT converges linearly with a constant stepsize to global $\epsilon$-approximation solution with $\mathcal{O}(\log (1/\epsilon))$ rounds of communication, which matches the time complexity of centralized GDA method. Finally, we numerically show that FedGDA-GT outperforms Local SGDA.

翻译：本文研究大规模多智能体极小极大优化问题，该问题建模了统计学习与博弈论中的许多有趣应用，包括生成对抗网络（GANs）。总体目标函数是各智能体私有局部目标函数之和。我们首先分析一个重要的特例——经验极小极大问题，其中总体目标通过统计样本近似真实总体极小极大风险。通过Rademacher复杂度分析，我们获得了该目标学习问题的泛化界。随后聚焦联邦场景，其中智能体可执行本地计算并与中央服务器通信。现有大多数联邦极小极大算法要么每轮迭代均需通信，要么缺乏性能保证（局部随机梯度下降上升法（SGDA）除外），该多局部更新下降上升算法能在递减步长下保证收敛。通过在无梯度噪声的理想条件下分析局部SGDA，我们证明该类算法在恒定步长下通常无法保证精确收敛，因此收敛速率较慢。为解决此问题，我们提出基于梯度追踪（GT）的改进型联邦（Fed）梯度下降上升（GDA）方法——FedGDA-GT。当局部目标函数满足Lipschitz光滑且强凸-强凹时，我们证明FedGDA-GT能以恒定步长线性收敛至全局$\epsilon$-近似解，仅需$\mathcal{O}(\log (1/\epsilon))$轮通信，与集中式GDA方法的时间复杂度相当。最后，数值实验表明FedGDA-GT优于局部SGDA。