Federated learning (FL) involves multiple heterogeneous clients collaboratively training a global model via iterative local updates and model fusion. The generalization of FL's global model has a large gap compared with centralized training, which is its bottleneck for broader applications. In this paper, we study and improve FL's generalization through a fundamental ``connectivity'' perspective, which means how the local models are connected in the parameter region and fused into a generalized global model. The term ``connectivity'' is derived from linear mode connectivity (LMC), studying the interpolated loss landscape of two different solutions (e.g., modes) of neural networks. Bridging the gap between LMC and FL, in this paper, we leverage fixed anchor models to empirically and theoretically study the transitivity property of connectivity from two models (LMC) to a group of models (model fusion in FL). Based on the findings, we propose FedGuCci and FedGuCci+, improving group connectivity for better generalization. It is shown that our methods can boost the generalization of FL under client heterogeneity across various tasks (4 CV datasets and 6 NLP datasets), models (both convolutional and transformer-based), and training paradigms (both from-scratch and pretrain-finetune).
翻译:联邦学习(FL)通过迭代局部更新和模型融合,使多个异构客户端协作训练全局模型。与集中式训练相比,FL全局模型的泛化性存在较大差距,这成为其更广泛应用的主要瓶颈。本文从"连通性"这一基本视角研究并提升FL的泛化能力,即局部模型在参数区域中如何相互连接并融合为具有泛化能力的全局模型。"连通性"这一术语源于线性模式连通性(LMC),它研究神经网络两种不同解(即模式)之间的插值损失景观。为弥合LMP与FL之间的鸿沟,本文利用固定锚点模型,从经验与理论层面研究连通性从两个模型(LMC)到一组模型(FL中的模型融合)的传递性。基于研究结果,我们提出FedGuCci和FedGuCci+方法,通过改进群体连通性提升泛化能力。实验表明,在客户端异构条件下,我们的方法能在多种任务(4个计算机视觉数据集和6个自然语言处理数据集)、模型(卷积与Transformer架构)及训练范式(从头训练与预训练微调)中显著提升FL的泛化性能。