The theoretical landscape of federated learning (FL) undergoes rapid evolution, but its practical application encounters a series of intricate challenges, and hyperparameter optimization is one of these critical challenges. Amongst the diverse adjustments in hyperparameters, the adaptation of the learning rate emerges as a crucial component, holding the promise of significantly enhancing the efficacy of FL systems. In response to this critical need, this paper presents FedHyper, a novel hypergradient-based learning rate adaptation algorithm specifically designed for FL. FedHyper serves as a universal learning rate scheduler that can adapt both global and local rates as the training progresses. In addition, FedHyper not only showcases unparalleled robustness to a spectrum of initial learning rate configurations but also significantly alleviates the necessity for laborious empirical learning rate adjustments. We provide a comprehensive theoretical analysis of FedHyper's convergence rate and conduct extensive experiments on vision and language benchmark datasets. The results demonstrate that FEDHYPER consistently converges 1.1-3x faster than FedAvg and the competing baselines while achieving superior final accuracy. Moreover, FedHyper catalyzes a remarkable surge in accuracy, augmenting it by up to 15% compared to FedAvg under suboptimal initial learning rate settings.
翻译:联邦学习(FL)的理论研究正在快速发展,但其实际应用仍面临一系列复杂挑战,超参数优化便是其中关键难题之一。在众多超参数调整中,学习率的自适应成为关键环节,有望显著提升联邦学习系统的效能。针对这一迫切需求,本文提出FedHyper——一种专为联邦学习设计的基于超梯度的学习率自适应算法。FedHyper作为通用学习率调度器,可在训练过程中同步调整全局学习率与局部学习率。此外,FedHyper不仅对各类初始学习率配置展现出无与伦比的鲁棒性,还极大缓解了繁琐的经验性学习率调整需求。我们对FedHyper的收敛速率进行了全面的理论分析,并在视觉和语言基准数据集上开展了广泛实验。结果表明:与FedAvg及竞争基线方法相比,FedHyper在实现更优最终精度的同时,收敛速度稳定提升1.1-3倍。尤为重要的是,在次优初始学习率设置下,FedHyper相较FedAvg可催化高达15%的精度飞跃。