Federated learning is a machine learning approach where multiple devices collaboratively learn with the help of a parameter server by sharing only their local updates. While gradient-based optimization techniques are widely adopted in this domain, the curvature information that second-order methods exhibit is crucial to guide and speed up the convergence. This paper introduces a scalable second-order method, allowing the adoption of curvature information in federated large models. Our method, coined Fed-Sophia, combines a weighted moving average of the gradient with a clipping operation to find the descent direction. In addition to that, a lightweight estimation of the Hessian's diagonal is used to incorporate the curvature information. Numerical evaluation shows the superiority, robustness, and scalability of the proposed Fed-Sophia scheme compared to first and second-order baselines.
翻译:联邦学习是一种机器学习方法,多个设备通过仅共享本地更新,在参数服务器的辅助下协同学习。尽管基于梯度的优化技术在该领域被广泛采用,但二阶方法所体现的曲率信息对于引导和加速收敛至关重要。本文提出了一种可扩展的二阶方法,使得曲率信息能够应用于联邦大规模模型中。我们的方法(名为Fed-Sophia)将梯度的加权移动平均与裁剪操作相结合,以寻找下降方向。此外,还利用Hessian矩阵对角线的轻量级估计来融入曲率信息。数值评估表明,与一阶和二阶基线方法相比,所提出的Fed-Sophia方案具有优越性、鲁棒性和可扩展性。