State-of-the-art federated learning algorithms such as FedAvg require carefully tuned stepsizes to achieve their best performance. The improvements proposed by existing adaptive federated methods involve tuning of additional hyperparameters such as momentum parameters, and consider adaptivity only in the server aggregation round, but not locally. These methods can be inefficient in many practical scenarios because they require excessive tuning of hyperparameters and do not capture local geometric information. In this work, we extend the recently proposed stochastic Polyak stepsize (SPS) to the federated learning setting, and propose new locally adaptive and nearly parameter-free distributed SPS variants (FedSPS and FedDecSPS). We prove that FedSPS converges linearly in strongly convex and sublinearly in convex settings when the interpolation condition (overparametrization) is satisfied, and converges to a neighborhood of the solution in the general case. We extend our proposed method to a decreasing stepsize version FedDecSPS, that converges also when the interpolation condition does not hold. We validate our theoretical claims by performing illustrative convex experiments. Our proposed algorithms match the optimization performance of FedAvg with the best tuned hyperparameters in the i.i.d. case, and outperform FedAvg in the non-i.i.d. case.
翻译:最先进的联邦学习算法(如FedAvg)需要精细调整步长才能达到最佳性能。现有自适应联邦方法提出的改进涉及调整额外超参数(如动量参数),且仅考虑服务器聚合轮次中的自适应性,而未考虑局部自适应。这些方法在许多实际场景中可能效率低下,因为它们需要过度调整超参数且无法捕获局部几何信息。在本研究中,我们将最新提出的随机Polyak步长(SPS)扩展至联邦学习场景,并提出全新的局部自适应且近乎无参数的分布式SPS变体(FedSPS和FedDecSPS)。我们证明,在满足插值条件(过参数化)时,FedSPS在强凸情况下线性收敛,在凸情况下次线性收敛;在一般情况下,则收敛至解的邻域。我们将提出的方法扩展至递减步长版本FedDecSPS,该版本在插值条件不满足时仍能收敛。我们通过凸实验验证了理论结果。所提出的算法在独立同分布(i.i.d.)情况下,其优化性能与采用最佳调优超参数的FedAvg相当,而在非独立同分布(non-i.i.d.)情况下则优于FedAvg。