The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client $n$ sends a sample from a client-only probability distribution $q_{\phi^{(n)}}$, and the server estimates the mean of the clients' distributions using these samples. However, such methods do not take full advantage of the FL setup where the server, throughout the training process, has side information in the form of a global distribution $p_{\theta}$ that is close to the clients' distribution $q_{\phi^{(n)}}$ in Kullback-Leibler (KL) divergence. In this work, we exploit this closeness between the clients' distributions $q_{\phi^{(n)}}$'s and the side information $p_{\theta}$ at the server, and propose a framework that requires approximately $D_{KL}(q_{\phi^{(n)}}|| p_{\theta})$ bits of communication. We show that our method can be integrated into many existing stochastic compression frameworks to attain the same (and often higher) test accuracy with up to $82$ times smaller bitrate than the prior work -- corresponding to 2,650 times overall compression.
翻译:联邦学习中,从客户端向服务器发送模型更新的高通信成本是可扩展联邦学习的主要瓶颈。在现有方法中,最优的比特率-精度权衡是通过随机压缩方法实现的——即客户端$n$从仅客户端概率分布$q_{\phi^{(n)}}$中抽取样本,服务器利用这些样本估计客户端分布的均值。然而,此类方法并未充分利用联邦学习的设置:在训练过程中,服务器拥有侧信息,即一个在库尔贝克-莱布勒散度上接近客户端分布$q_{\phi^{(n)}}$的全局分布$p_{\theta}$。本研究利用客户端分布$q_{\phi^{(n)}}$与服务器端侧信息$p_{\theta}$之间的接近性,提出一个框架,其通信量约为$D_{KL}(q_{\phi^{(n)}}|| p_{\theta})$比特。我们证明,该方法可集成到多种现有随机压缩框架中,在达到相同(且通常更高)测试精度的同时,比特率比先前工作降低多达82倍——对应总体压缩比达2,650倍。