Federated learning heavily relies on distributed gradient descent techniques. In the situation where gradient information is not available, the gradients need to be estimated from zeroth-order information, which typically involves computing finite-differences along isotropic random directions. This method suffers from high estimation errors, as the geometric features of the objective landscape may be overlooked during the isotropic sampling. In this work, we propose a non-isotropic sampling method to improve the gradient estimation procedure. Gradients in our method are estimated in a subspace spanned by historical trajectories of solutions, aiming to encourage the exploration of promising regions and hence improve the convergence. The proposed method uses a covariance matrix for sampling which is a convex combination of two parts. The first part is a thin projection matrix containing the basis of the subspace which is designed to improve the exploitation ability. The second part is the historical trajectories. We implement this method in zeroth-order federated settings, and show that the convergence rate aligns with existing ones while introducing no significant overheads in communication or local computation. The effectiveness of our proposal is verified on several numerical experiments in comparison to several commonly-used zeroth-order federated optimization algorithms.
翻译:联邦学习在很大程度上依赖于分布式梯度下降技术。在梯度信息不可得的情况下,需要从零阶信息中估计梯度,这通常涉及沿各向同性随机方向计算有限差分。由于各向同性采样可能忽略目标函数几何特征,该方法存在较高的估计误差。在本工作中,我们提出了一种非各向同性采样方法来改进梯度估计过程。该方法在由解的历史轨迹张成的子空间中估计梯度,旨在促进对有希望区域的探索,从而改善收敛性。所提方法使用一个协方差矩阵进行采样,该矩阵是两个部分的凸组合。第一部分是一个包含子空间基的薄投影矩阵,旨在提高开发能力。第二部分是历史轨迹。我们在零阶联邦学习设置中实现了该方法,并证明其收敛速度与现有方法一致,同时在通信或本地计算方面未引入显著开销。通过与几种常用的零阶联邦优化算法进行比较,我们在多个数值实验中验证了所提方法的有效性。