Zeroth-order (ZO) optimization enables dimension-free communication in federated learning (FL), making it attractive for fine-tuning of large language models (LLMs) due to significant communication savings. However, existing ZO-FL methods largely overlook curvature information, despite its well-established benefits for convergence acceleration. To address this, we propose HiSo, a Hessian-informed ZO federated optimization method that accelerates convergence by leveraging global diagonal Hessian approximations, while strictly preserving scalar-only communication without transmitting any second-order information. Theoretically, for non-convex functions, we show that HiSo can achieve an accelerated convergence rate that is independent of the Lipschitz constant $L$ and model dimension $d$ under some Hessian approximation assumptions, offering a plausible explanation for the observed phenomenon of ZO convergence being much faster than its worst-case $\mathscr{O}(d)$-bound. Empirically, across diverse LLM fine-tuning benchmarks, HiSo delivers a 1$\sim$5$\times$ speedup in communication rounds over existing state-of-the-art ZO-FL baselines. This superior convergence not only cuts communication costs but also provides strong empirical evidence that Hessian information acts as an effective accelerator in federated ZO optimization settings. Our source code is provided at https://github.com/ZidongLiu/DeComFL.
翻译:零阶(ZO)优化在联邦学习(FL)中实现了与维度无关的通信,因其显著的通信节省而使其在大语言模型(LLM)微调中极具吸引力。然而,现有的ZO-FL方法在很大程度上忽视了曲率信息,尽管其在加速收敛方面已被公认具有益处。为解决此问题,我们提出了HiSo,一种基于海森信息的ZO联邦优化方法,该方法通过利用全局对角海森矩阵近似来加速收敛,同时严格保持仅标量通信,不传输任何二阶信息。理论上,对于非凸函数,我们证明了在某些海森矩阵近似假设下,HiSo能够实现一个独立于利普希茨常数 $L$ 和模型维度 $d$ 的加速收敛速率,这为观察到的ZO收敛速度远快于其最坏情况 $\mathscr{O}(d)$ 界限的现象提供了一个合理的解释。实证上,在多样化的LLM微调基准测试中,与现有最先进的ZO-FL基线相比,HiSo在通信轮次上实现了1$\sim$5$\times$的加速。这种优越的收敛性不仅降低了通信成本,而且提供了强有力的实证证据,表明海森信息在联邦ZO优化设置中充当了有效的加速器。我们的源代码发布于 https://github.com/ZidongLiu/DeComFL。