Federated bilevel optimization has attracted increasing attention due to emerging machine learning and communication applications. The biggest challenge lies in computing the gradient of the upper-level objective function (i.e., hypergradient) in the federated setting due to the nonlinear and distributed construction of a series of global Hessian matrices. In this paper, we propose a novel communication-efficient federated hypergradient estimator via aggregated iterative differentiation (AggITD). AggITD is simple to implement and significantly reduces the communication cost by conducting the federated hypergradient estimation and the lower-level optimization simultaneously. We show that the proposed AggITD-based algorithm achieves the same sample complexity as existing approximate implicit differentiation (AID)-based approaches with much fewer communication rounds in the presence of data heterogeneity. Our results also shed light on the great advantage of ITD over AID in the federated/distributed hypergradient estimation. This differs from the comparison in the non-distributed bilevel optimization, where ITD is less efficient than AID. Our extensive experiments demonstrate the great effectiveness and communication efficiency of the proposed method.
翻译:联邦双层优化因其在机器学习与通信应用中的新兴需求而日益受到关注。最大挑战在于联邦场景中上层目标函数梯度(即超梯度)的计算,这是由于一系列全局海森矩阵的非线性与分布式构建所致。本文提出一种新颖的通信高效联邦超梯度估计方法——基于聚合迭代差分(AggITD)。AggITD实现简单,通过同步执行联邦超梯度估计与下层优化,显著降低通信成本。我们证明,所提出的AggITD算法在数据异构条件下,能以更少的通信轮次达到与现有近似隐式差分(AID)方法相同的样本复杂度。研究结果还揭示了在联邦/分布式超梯度估计中,ITD相较于AID的巨大优势——这与非分布式双层优化中ITD效率低于AID的结论形成鲜明对比。大量实验证明了所提方法的显著有效性与通信高效性。