Federated bilevel optimization has attracted increasing attention due to emerging machine learning and communication applications. The biggest challenge lies in computing the gradient of the upper-level objective function (i.e., hypergradient) in the federated setting due to the nonlinear and distributed construction of a series of global Hessian matrices. In this paper, we propose a novel communication-efficient federated hypergradient estimator via aggregated iterative differentiation (AggITD). AggITD is simple to implement and significantly reduces the communication cost by conducting the federated hypergradient estimation and the lower-level optimization simultaneously. We show that the proposed AggITD-based algorithm achieves the same sample complexity as existing approximate implicit differentiation (AID)-based approaches with much fewer communication rounds in the presence of data heterogeneity. Our results also shed light on the great advantage of ITD over AID in the federated/distributed hypergradient estimation. This differs from the comparison in the non-distributed bilevel optimization, where ITD is less efficient than AID. Our extensive experiments demonstrate the great effectiveness and communication efficiency of the proposed method.
翻译:联邦双层优化因新兴的机器学习与通信应用而日益受到关注。其最大挑战在于联邦环境下上层目标函数梯度(即超梯度)的计算,这是由于一系列全局Hessian矩阵的非线性与分布式构造所致。本文提出一种新颖的通信高效联邦超梯度估计器——基于聚合迭代微分(Aggregated Iterative Differentiation, AggITD)。AggITD实现简便,通过同步进行联邦超梯度估计与下层优化,显著降低通信成本。我们证明,所提出的基于AggITD的算法在数据异质性存在的情况下,能以更少的通信轮次达到与现有基于近似隐式微分(AID)的方法相同的样本复杂度。研究结果还揭示了在联邦/分布式超梯度估计中,ITD相较于AID具有显著优势,这与非分布式双层优化中ITD效率低于AID的对比结果截然不同。大量实验表明,所提方法具备卓越的有效性与通信效率。