In federated learning (FL), weighted aggregation of local models is conducted to generate a global model, and the aggregation weights are normalized (the sum of weights is 1) and proportional to the local data sizes. In this paper, we revisit the weighted aggregation process and gain new insights into the training dynamics of FL. First, we find that the sum of weights can be smaller than 1, causing global weight shrinking effect (analogous to weight decay) and improving generalization. We explore how the optimal shrinking factor is affected by clients' data heterogeneity and local epochs. Second, we dive into the relative aggregation weights among clients to depict the clients' importance. We develop client coherence to study the learning dynamics and find a critical point that exists. Before entering the critical point, more coherent clients play more essential roles in generalization. Based on the above insights, we propose an effective method for Federated Learning with Learnable Aggregation Weights, named as FedLAW. Extensive experiments verify that our method can improve the generalization of the global model by a large margin on different datasets and models.
翻译:在联邦学习中,通过对局部模型进行加权聚合生成全局模型,聚合权重经过归一化处理(权重之和为1)且与局部数据规模成正比。本文重新审视了加权聚合过程,并对联邦学习的训练动态获得了新见解。首先,我们发现权重之和可能小于1,这会导致全局权重收缩效应(类似于权重衰减),从而提升泛化性能。我们探究了最优收缩因子如何受到客户端数据异质性和局部训练轮次的影响。其次,我们深入分析了客户端之间的相对聚合权重以描述其重要性。通过引入客户端一致性概念研究学习动态,我们发现存在一个临界点。在进入临界点前,一致性更高的客户端在泛化中发挥更关键的作用。基于上述见解,我们提出了一种有效的联邦学习方法——可学习聚合权重联邦学习(FedLAW)。大量实验证明,该方法在多种数据集和模型上均能大幅提升全局模型的泛化性能。