To mitigate the privacy leakages and communication burdens of Federated Learning (FL), decentralized FL (DFL) discards the central server and each client only communicates with its neighbors in a decentralized communication network. However, existing DFL suffers from high inconsistency among local clients, which results in severe distribution shift and inferior performance compared with centralized FL (CFL), especially on heterogeneous data or sparse communication topology. To alleviate this issue, we propose two DFL algorithms named DFedSAM and DFedSAM-MGS to improve the performance of DFL. Specifically, DFedSAM leverages gradient perturbation to generate local flat models via Sharpness Aware Minimization (SAM), which searches for models with uniformly low loss values. DFedSAM-MGS further boosts DFedSAM by adopting Multiple Gossip Steps (MGS) for better model consistency, which accelerates the aggregation of local flat models and better balances communication complexity and generalization. Theoretically, we present improved convergence rates $\small \mathcal{O}\big(\frac{1}{\sqrt{KT}}+\frac{1}{T}+\frac{1}{K^{1/2}T^{3/2}(1-\lambda)^2}\big)$ and $\small \mathcal{O}\big(\frac{1}{\sqrt{KT}}+\frac{1}{T}+\frac{\lambda^Q+1}{K^{1/2}T^{3/2}(1-\lambda^Q)^2}\big)$ in non-convex setting for DFedSAM and DFedSAM-MGS, respectively, where $1-\lambda$ is the spectral gap of gossip matrix and $Q$ is the number of MGS. Empirically, our methods can achieve competitive performance compared with CFL methods and outperform existing DFL methods.
翻译:为缓解联邦学习中的隐私泄露与通信负担,去中心化联邦学习摒弃中央服务器,各客户端仅在去中心化通信网络中与邻居节点交互。然而,现有DFL存在严重的本地客户端间不一致性问题,导致严重的分布偏移,且性能劣于中心化联邦学习,尤其在异构数据或稀疏通信拓扑场景下。针对此问题,我们提出两种DFL算法——DFedSAM与DFedSAM-MGS——以提升DFL性能。具体而言,DFedSAM通过锐度感知最小化利用梯度扰动生成局部平坦模型,该优化方法可搜索损失值均匀降低的模型。DFedSAM-MGS进一步采用多重八卦步骤增强DFedSAM,通过加速局部平坦模型聚合与平衡通信复杂度与泛化能力,实现更优的模型一致性。理论上,我们在非凸场景下给出DFedSAM与DFedSAM-MGS的改进收敛率分别为 $\small \mathcal{O}\big(\frac{1}{\sqrt{KT}}+\frac{1}{T}+\frac{1}{K^{1/2}T^{3/2}(1-\lambda)^2}\big)$ 与 $\small \mathcal{O}\big(\frac{1}{\sqrt{KT}}+\frac{1}{T}+\frac{\lambda^Q+1}{K^{1/2}T^{3/2}(1-\lambda^Q)^2}\big)$,其中 $1-\lambda$ 为八卦矩阵谱间隙,$Q$ 为MGS步数。实验表明,本方法可达到与CFL方法相当的竞争性能,并优于现有DFL方法。