This paper addresses the communication issues when estimating hyper-gradients in decentralized federated learning (FL). Hyper-gradients in decentralized FL quantifies how the performance of globally shared optimal model is influenced by the perturbations in clients' hyper-parameters. In prior work, clients trace this influence through the communication of Hessian matrices over a static undirected network, resulting in (i) excessive communication costs and (ii) inability to make use of more efficient and robust networks, namely, time-varying directed networks. To solve these issues, we introduce an alternative optimality condition for FL using an averaging operation on model parameters and gradients. We then employ Push-Sum as the averaging operation, which is a consensus optimization technique for time-varying directed networks. As a result, the hyper-gradient estimator derived from our optimality condition enjoys two desirable properties; (i) it only requires Push-Sum communication of vectors and (ii) it can operate over time-varying directed networks. We confirm the convergence of our estimator to the true hyper-gradient both theoretically and empirically, and we further demonstrate that it enables two novel applications: decentralized influence estimation and personalization over time-varying networks.
翻译:摘要:本文解决了去中心化联邦学习(FL)中超梯度估计过程中的通信问题。去中心化联邦学习中的超梯度量化了全局共享最优模型的性能如何受到客户端超参数扰动的影響。在先前的工作中,客户端通过静态无向网络上的海森矩阵通信来追踪这种影响,导致(i)通信成本过高,以及(ii)无法利用更高效、更鲁棒的网络(即时变有向网络)。为解决这些问题,我们提出了一种基于模型参数和梯度平均操作的FL最优性替代条件。随后采用Push-Sum作为平均操作,这是一种适用于时变有向网络的共识优化技术。由此,基于我们的最优性条件推导出的超梯度估计器具有两个理想特性:(i)仅需要向量的Push-Sum通信,以及(ii)可在时变有向网络上运行。我们从理论上和实验上验证了该估计器收敛于真实超梯度,并进一步证明它实现了两个创新应用:去中心化影响估计和时变网络上的个性化学习。