To mitigate the privacy leakages and communication burdens of Federated Learning (FL), decentralized FL (DFL) discards the central server and each client only communicates with its neighbors in a decentralized communication network. However, existing DFL suffers from high inconsistency among local clients, which results in severe distribution shift and inferior performance compared with centralized FL (CFL), especially on heterogeneous data or sparse communication topology. To alleviate this issue, we propose two DFL algorithms named DFedSAM and DFedSAM-MGS to improve the performance of DFL. Specifically, DFedSAM leverages gradient perturbation to generate local flat models via Sharpness Aware Minimization (SAM), which searches for models with uniformly low loss values. DFedSAM-MGS further boosts DFedSAM by adopting Multiple Gossip Steps (MGS) for better model consistency, which accelerates the aggregation of local flat models and better balances communication complexity and generalization. Theoretically, we present improved convergence rates $\small \mathcal{O}\big(\frac{1}{\sqrt{KT}}+\frac{1}{T}+\frac{1}{K^{1/2}T^{3/2}(1-\lambda)^2}\big)$ and $\small \mathcal{O}\big(\frac{1}{\sqrt{KT}}+\frac{1}{T}+\frac{\lambda^Q+1}{K^{1/2}T^{3/2}(1-\lambda^Q)^2}\big)$ in non-convex setting for DFedSAM and DFedSAM-MGS, respectively, where $1-\lambda$ is the spectral gap of gossip matrix and $Q$ is the number of MGS. Empirically, our methods can achieve competitive performance compared with CFL methods and outperform existing DFL methods.
翻译:为缓解联邦学习(FL)中的隐私泄露与通信负担,去中心化FL(DFL)摒弃中央服务器,各客户端仅与去中心化通信网络中的邻接节点进行交互。然而,现有DFL方法面临严重的局部客户端间不一致问题,尤其在异构数据或稀疏通信拓扑场景下,这会导致严重的分布偏移及性能劣于集中式FL(CFL)。针对该问题,本文提出两种DFL算法——DFedSAM与DFedSAM-MGS——以提升DFL性能。具体而言,DFedSAM通过锐度感知最小化(SAM)利用梯度扰动生成局部平坦模型,搜索具有一致低损失值的模型。DFedSAM-MGS进一步采用多轮八卦步(MGS)增强模型一致性以改进DFedSAM,该方法加速局部平坦模型的聚合过程,并更好地平衡通信复杂度与泛化能力。理论上,我们分别给出DFedSAM与DFedSAM-MGS在非凸设定下的改进收敛速率:$\small \mathcal{O}\big(\frac{1}{\sqrt{KT}}+\frac{1}{T}+\frac{1}{K^{1/2}T^{3/2}(1-\lambda)^2}\big)$ 和 $\small \mathcal{O}\big(\frac{1}{\sqrt{KT}}+\frac{1}{T}+\frac{\lambda^Q+1}{K^{1/2}T^{3/2}(1-\lambda^Q)^2}\big)$,其中$1-\lambda$为八卦矩阵的谱间隙,$Q$为MGS轮数。实验表明,本方法可获得与CFL方法相媲美的性能,并优于现有DFL方法。