The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client $n$ sends a sample from a client-only probability distribution $q_{\phi^{(n)}}$, and the server estimates the mean of the clients' distributions using these samples. However, such methods do not take full advantage of the FL setup where the server, throughout the training process, has side information in the form of a pre-data distribution $p_{\theta}$ that is close to the client's distribution $q_{\phi^{(n)}}$ in Kullback-Leibler (KL) divergence. In this work, we exploit this closeness between the clients' distributions $q_{\phi^{(n)}}$'s and the side information $p_{\theta}$ at the server, and propose a framework that requires approximately $D_{KL}(q_{\phi^{(n)}}|| p_{\theta})$ bits of communication. We show that our method can be integrated into many existing stochastic compression frameworks such as FedPM, Federated SGLD, and QSGD to attain the same (and often higher) test accuracy with up to $50$ times reduction in the bitrate.
翻译:客户端向服务器发送模型更新的高通信成本是可扩展联邦学习(FL)的主要瓶颈。在现有方法中,最优的比特率-精度权衡通过随机压缩方法实现——其中客户端 $n$ 从仅客户端可用的概率分布 $q_{\phi^{(n)}}$ 中抽取样本,服务器利用这些样本估计客户端分布的均值。然而,此类方法未能充分利用FL设置的一个特点:在整个训练过程中,服务器拥有先验数据分布 $p_{\theta}$ 形式的边信息,该分布与客户端分布 $q_{\phi^{(n)}}$ 在Kullback-Leibler (KL)散度上接近。在本文中,我们利用客户端分布 $q_{\phi^{(n)}}$ 与服务器端边信息 $p_{\theta}$ 之间的这种接近性,提出了一种通信量约为 $D_{KL}(q_{\phi^{(n)}}|| p_{\theta})$ 比特的框架。我们证明,该方法可集成到许多现有随机压缩框架(如FedPM、Federated SGLD和QSGD)中,在比特率降低高达50倍的同时,达到相同(且通常更高)的测试精度。