We investigate the generalization error of statistical learning models in a Federated Learning (FL) setting. Specifically, we study the evolution of the generalization error with the number of communication rounds between the clients and the parameter server, i.e., the effect on the generalization error of how often the local models as computed by the clients are aggregated at the parameter server. We establish PAC-Bayes and rate-distortion theoretic bounds on the generalization error that account explicitly for the effect of the number of rounds, say $ R \in \mathbb{N}$, in addition to the number of participating devices $K$ and individual datasets size $n$. The bounds, which apply in their generality for a large class of loss functions and learning algorithms, appear to be the first of their kind for the FL setting. Furthermore, we apply our bounds to FL-type Support Vector Machines (FSVM); and we derive (more) explicit bounds on the generalization error in this case. In particular, we show that the generalization error of FSVM increases with $R$, suggesting that more frequent communication with the parameter server diminishes the generalization power of such learning algorithms. Combined with that the empirical risk generally decreases for larger values of $R$, this indicates that $R$ might be a parameter to optimize in order to minimize the population risk of FL algorithms. Moreover, specialized to the case $R=1$ (sometimes referred to as "one-shot" FL or distributed learning) our bounds suggest that the generalization error of the FL setting decreases faster than that of centralized learning by a factor of $\mathcal{O}(\sqrt{\log(K)/K})$, thereby generalizing recent findings in this direction to arbitrary loss functions and algorithms. The results of this paper are also validated on some experiments.
翻译:我们研究了联邦学习(FL)环境下统计学习模型的泛化误差。具体而言,我们探讨了泛化误差随客户端与参数服务器之间通信轮次的变化规律,即客户端计算的本地模型在参数服务器上聚合的频率对泛化误差的影响。我们建立了基于PAC-Bayes和率失真理论的泛化误差界,这些界明确考虑了通信轮次$R \in \mathbb{N}$的影响,同时兼顾了参与设备数量$K$和个体数据集大小$n$。这些界适用于一大类损失函数和学习算法,是FL环境下首次提出的此类界。此外,我们将这些界应用于FL支持向量机(FSVM),并在此情况下推导出更明确的泛化误差界。特别地,我们证明FSVM的泛化误差随$R$增大而增加,表明更频繁地与参数服务器通信会削弱此类学习算法的泛化能力。结合经验风险通常随$R$增大而降低这一事实,这表明$R可能是优化FL算法总体风险的一个可调参数。此外,当$R=1$时(有时称为"一次性"FL或分布式学习),我们的界表明FL环境的泛化误差下降速度比集中式学习快$\mathcal{O}(\sqrt{\log(K)/K})$倍,从而将这一方向的最新发现推广到任意损失函数和算法。本文的结果也通过一些实验得到了验证。