Federated Learning is a popular distributed learning paradigm in machine learning. Meanwhile, composition optimization is an effective hierarchical learning model, which appears in many machine learning applications such as meta learning and robust learning. More recently, although a few federated composition optimization algorithms have been proposed, they still suffer from high sample and communication complexities. In the paper, thus, we propose a class of faster federated compositional optimization algorithms (i.e., MFCGD and AdaMFCGD) to solve the nonconvex distributed composition problems, which builds on the momentum-based variance reduced and local-SGD techniques. In particular, our adaptive algorithm (i.e., AdaMFCGD) uses a unified adaptive matrix to flexibly incorporate various adaptive learning rates. Moreover, we provide a solid theoretical analysis for our algorithms under non-i.i.d. setting, and prove our algorithms obtain a lower sample and communication complexities simultaneously than the existing federated compositional algorithms. Specifically, our algorithms obtain lower sample complexity of $\tilde{O}(\epsilon^{-3})$ with lower communication complexity of $\tilde{O}(\epsilon^{-2})$ in finding an $\epsilon$-stationary solution. We conduct the numerical experiments on robust federated learning and distributed meta learning tasks to demonstrate the efficiency of our algorithms.
翻译:联邦学习是机器学习中一种流行的分布式学习范式。同时,组合优化作为一种有效的层次化学习模型,出现在许多机器学习应用中,如元学习和鲁棒学习。尽管近期已有少数联邦组合优化算法被提出,但它们仍存在样本复杂度和通信复杂度较高的问题。因此,本文提出了一类更快的联邦组合优化算法(即MFCGD和AdaMFCGD),用于解决非凸分布式组合问题,这些算法基于动量方差缩减和局部SGD技术。特别地,我们的自适应算法(即AdaMFCGD)使用统一的自适应矩阵灵活地整合各种自适应学习率。此外,我们在非独立同分布设置下对算法进行了可靠的理论分析,并证明我们的算法在样本复杂度和通信复杂度上同时低于现有的联邦组合算法。具体而言,在寻找$\epsilon$-稳定解时,我们的算法以更低的通信复杂度$\tilde{O}(\epsilon^{-2})$获得了更低的样本复杂度$\tilde{O}(\epsilon^{-3})$。我们在鲁棒联邦学习和分布式元学习任务上进行了数值实验,以证明我们算法的有效性。