Data heterogeneity across clients is a key challenge in federated learning. Prior works address this by either aligning client and server models or using control variates to correct client model drift. Although these methods achieve fast convergence in convex or simple non-convex problems, the performance in over-parameterized models such as deep neural networks is lacking. In this paper, we first revisit the widely used FedAvg algorithm in a deep neural network to understand how data heterogeneity influences the gradient updates across the neural network layers. We observe that while the feature extraction layers are learned efficiently by FedAvg, the substantial diversity of the final classification layers across clients impedes the performance. Motivated by this, we propose to correct model drift by variance reduction only on the final layers. We demonstrate that this significantly outperforms existing benchmarks at a similar or lower communication cost. We furthermore provide proof for the convergence rate of our algorithm.
翻译:客户端间的数据异构性是联邦学习面临的主要挑战。现有方法通过对齐客户端与服务器模型,或采用控制变量校正客户端模型漂移来应对这一问题。尽管这些方法在凸问题或简单非凸问题上可实现快速收敛,但在深度神经网络等过参数化模型中的表现仍有不足。本文首先重新审视深度神经网络中广泛应用的FedAvg算法,探究数据异构性如何影响神经网络各层的梯度更新。观察到FedAvg虽能高效学习特征提取层,但不同客户端最终分类层的显著差异性阻碍了模型性能。基于此发现,我们提出仅对最终层进行方差缩减来校正模型漂移。实验表明,该方法在同等或更低通信成本下显著优于现有基准。我们进一步给出了所提算法收敛率的理论证明。