We study the generalization error of statistical learning models in a Federated Learning (FL) setting. Specifically, there are $K$ devices or clients, each holding an independent own dataset of size $n$. Individual models, learned locally via Stochastic Gradient Descent, are aggregated (averaged) by a central server into a global model and then sent back to the devices. We consider multiple (say $R \in \mathbb N^*$) rounds of model aggregation and study the effect of $R$ on the generalization error of the final aggregated model. We establish an upper bound on the generalization error that accounts explicitly for the effect of $R$ (in addition to the number of participating devices $K$ and dataset size $n$). It is observed that, for fixed $(n, K)$, the bound increases with $R$, suggesting that the generalization of such learning algorithms is negatively affected by more frequent communication with the parameter server. Combined with the fact that the empirical risk, however, generally decreases for larger values of $R$, this indicates that $R$ might be a parameter to optimize to reduce the population risk of FL algorithms. The results of this paper, which extend straightforwardly to the heterogeneous data setting, are also illustrated through numerical examples.
翻译:我们研究联邦学习(FL)场景下统计学习模型的泛化误差。具体而言,存在 \(K\) 个设备或客户端,每个设备持有独立的大小为 \(n\) 的数据集。通过随机梯度下降局部学习的个体模型,由中央服务器聚合(平均)为全局模型,然后发送回设备。我们考虑多轮(例如 \(R \in \mathbb N^*\))模型聚合,并研究 \(R\) 对最终聚合模型泛化误差的影响。我们建立了泛化误差的上界,该上界明确考虑了 \(R\) 的影响(除了参与设备数量 \(K\) 和数据集大小 \(n\) 之外)。观察到,对于固定的 \((n, K)\),该上界随 \(R\) 增加而增大,表明此类学习算法的泛化性能受与参数服务器更频繁通信的负面影响。然而,结合经验风险通常随 \(R\) 增大而减小的事实,这表明 \(R\) 可能是优化联邦学习算法总体风险的一个参数。本文的结果可简单扩展至异构数据设置,并通过数值示例加以说明。