We study the generalization error of statistical learning models in a Federated Learning (FL) setting. Specifically, there are $K$ devices or clients, each holding an independent own dataset of size $n$. Individual models, learned locally via Stochastic Gradient Descent, are aggregated (averaged) by a central server into a global model and then sent back to the devices. We consider multiple (say $R \in \mathbb N^*$) rounds of model aggregation and study the effect of $R$ on the generalization error of the final aggregated model. We establish an upper bound on the generalization error that accounts explicitly for the effect of $R$ (in addition to the number of participating devices $K$ and dataset size $n$). It is observed that, for fixed $(n, K)$, the bound increases with $R$, suggesting that the generalization of such learning algorithms is negatively affected by more frequent communication with the parameter server. Combined with the fact that the empirical risk, however, generally decreases for larger values of $R$, this indicates that $R$ might be a parameter to optimize to reduce the population risk of FL algorithms. The results of this paper, which extend straightforwardly to the heterogeneous data setting, are also illustrated through numerical examples.
翻译:我们研究了联邦学习(FL)场景下统计学习模型的泛化误差。具体而言,设有 \(K\) 个设备或客户端,每个设备持有独立的规模为 \(n\) 的数据集。通过随机梯度下降局部学习的个体模型,由中央服务器聚合(平均)为全局模型,再发送回各设备。我们考虑了多轮(例如 \(R \in \mathbb N^*\))模型聚合,并研究了 \(R\) 对最终聚合模型泛化误差的影响。我们建立了一个泛化误差的上界,该上界明确考虑了 \(R\) 的影响(此外还考虑了参与设备数量 \(K\) 和数据集规模 \(n\))。观察到,对于固定的 \((n, K)\),该上界随 \(R\) 增大而增加,这表明此类学习算法的泛化性能因与参数服务器的更频繁通信而受到负面影响。然而,结合经验风险通常随 \(R\) 增大而减小的事实,这表明 \(R\) 可能是一个需要优化的参数,以降低 FL 算法的群体风险。本文的结果(可直接推广到异构数据设置)也通过数值示例进行了说明。