Federated Learning You May Communicate Less Often!

We investigate the generalization error of statistical learning models in a Federated Learning (FL) setting. Specifically, we study the evolution of the generalization error with the number of communication rounds between the clients and the parameter server, i.e., the effect on the generalization error of how often the local models as computed by the clients are aggregated at the parameter server. We establish PAC-Bayes and rate-distortion theoretic bounds on the generalization error that account explicitly for the effect of the number of rounds, say $ R \in \mathbb{N}$, in addition to the number of participating devices $K$ and individual datasets size $n$. The bounds, which apply in their generality for a large class of loss functions and learning algorithms, appear to be the first of their kind for the FL setting. Furthermore, we apply our bounds to FL-type Support Vector Machines (FSVM); and we derive (more) explicit bounds on the generalization error in this case. In particular, we show that the generalization error of FSVM increases with $R$, suggesting that more frequent communication with the parameter server diminishes the generalization power of such learning algorithms. Combined with that the empirical risk generally decreases for larger values of $R$, this indicates that $R$ might be a parameter to optimize in order to minimize the population risk of FL algorithms. Moreover, specialized to the case $R=1$ (sometimes referred to as "one-shot" FL or distributed learning) our bounds suggest that the generalization error of the FL setting decreases faster than that of centralized learning by a factor of $\mathcal{O}(\sqrt{\log(K)/K})$, thereby generalizing recent findings in this direction to arbitrary loss functions and algorithms. The results of this paper are also validated on some experiments.

翻译：我们研究了联邦学习（FL）环境下统计学习模型的泛化误差。具体而言，我们探讨了泛化误差随客户端与参数服务器之间通信轮次的变化规律，即客户端计算的本地模型在参数服务器上聚合的频率对泛化误差的影响。我们建立了基于PAC-Bayes和率失真理论的泛化误差界，这些界明确考虑了通信轮次$R \in \mathbb{N}$的影响，同时兼顾了参与设备数量$K$和个体数据集大小$n$。这些界适用于一大类损失函数和学习算法，是FL环境下首次提出的此类界。此外，我们将这些界应用于FL支持向量机（FSVM），并在此情况下推导出更明确的泛化误差界。特别地，我们证明FSVM的泛化误差随$R$增大而增加，表明更频繁地与参数服务器通信会削弱此类学习算法的泛化能力。结合经验风险通常随$R$增大而降低这一事实，这表明$R可能是优化FL算法总体风险的一个可调参数。此外，当$R=1$时（有时称为"一次性"FL或分布式学习），我们的界表明FL环境的泛化误差下降速度比集中式学习快$\mathcal{O}(\sqrt{\log(K)/K})$倍，从而将这一方向的最新发现推广到任意损失函数和算法。本文的结果也通过一些实验得到了验证。

相关内容

泛化误差

关注 107

学习方法的泛化能力（Generalization Error）是由该方法学习到的模型对未知数据的预测能力，是学习方法本质上重要的性质。现实中采用最多的办法是通过测试泛化误差来评价学习方法的泛化能力。泛化误差界刻画了学习算法的经验风险与期望风险之间偏差和收敛速度。一个机器学习的泛化误差（Generalization Error），是一个描述学生机器在从样品数据中学习之后，离教师机器之间的差距的函数。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

斯坦福李飞飞高徒Johnson博士论文: 组成式计算机视觉智能,195页PDF

专知会员服务

71+阅读 · 2019年10月27日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日