Recently, compositional optimization (CO) has gained popularity because of its applications in distributionally robust optimization (DRO) and many other machine learning problems. Large-scale and distributed availability of data demands the development of efficient federated learning (FL) algorithms for solving CO problems. Developing FL algorithms for CO is particularly challenging because of the compositional nature of the objective. Moreover, current state-of-the-art methods to solve such problems rely on large batch gradients (depending on the solution accuracy) not feasible for most practical settings. To address these challenges, in this work, we propose efficient FedAvg-type algorithms for solving non-convex CO in the FL setting. We first establish that vanilla FedAvg is not suitable to solve distributed CO problems because of the data heterogeneity in the compositional objective at each client which leads to the amplification of bias in the local compositional gradient estimates. To this end, we propose a novel FL framework FedDRO that utilizes the DRO problem structure to design a communication strategy that allows FedAvg to control the bias in the estimation of the compositional gradient. A key novelty of our work is to develop solution accuracy-independent algorithms that do not require large batch gradients (and function evaluations) for solving federated CO problems. We establish $\mathcal{O}(\epsilon^{-2})$ sample and $\mathcal{O}(\epsilon^{-3/2})$ communication complexity in the FL setting while achieving linear speedup with the number of clients. We corroborate our theoretical findings with empirical studies on large-scale DRO problems.
翻译:最近,组合优化(CO)因在分布鲁棒优化(DRO)及许多其他机器学习问题中的应用而备受关注。数据的大规模分布式可用性要求开发高效的联邦学习(FL)算法来解决CO问题。由于目标的组合性质,为CO开发FL算法尤为具有挑战性。此外,当前解决此类问题的最先进方法依赖于大批次梯度(取决于解精度),这在大多数实际场景中不可行。为应对这些挑战,本文提出了高效的FedAvg型算法,用于解决FL设置下的非凸CO问题。我们首先证实,由于每个客户端组合目标中的数据异质性会导致局部组合梯度估计中的偏差放大,原始FedAvg不适用于解决分布式CO问题。为此,我们提出了一种新颖的FL框架FedDRO,该框架利用DRO问题结构设计通信策略,使FedAvg能够控制组合梯度估计中的偏差。本工作的一个关键创新是开发了与解精度无关的算法,该算法无需大批次梯度(及函数评估)即可解决联邦CO问题。我们建立了FL设置下的$\mathcal{O}(\epsilon^{-2})$样本复杂度和$\mathcal{O}(\epsilon^{-3/2})$通信复杂度,同时实现了随客户端数量线性加速的效果。我们通过大规模DRO问题上的实证研究验证了理论发现。