The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client $n$ sends a sample from a client-only probability distribution $q_{\phi^{(n)}}$, and the server estimates the mean of the clients' distributions using these samples. However, such methods do not take full advantage of the FL setup where the server, throughout the training process, has side information in the form of a pre-data distribution $p_{\theta}$ that is close to the client's distribution $q_{\phi^{(n)}}$ in Kullback-Leibler (KL) divergence. In this work, we exploit this closeness between the clients' distributions $q_{\phi^{(n)}}$'s and the side information $p_{\theta}$ at the server, and propose a framework that requires approximately $D_{KL}(q_{\phi^{(n)}}|| p_{\theta})$ bits of communication. We show that our method can be integrated into many existing stochastic compression frameworks such as FedPM, Federated SGLD, and QSGD to attain the same (and often higher) test accuracy with up to $50$ times reduction in the bitrate.
翻译:联邦学习中,客户端向服务器发送模型更新的高通信成本是可扩展性的主要瓶颈。在现有方法中,基于随机压缩的方法(即客户端$n$从仅客户端可访问的概率分布$q_{\phi^{(n)}}$中采样,服务器利用这些样本估计各客户端分布的均值)实现了最先进的比特率-精度权衡。然而,这类方法未能充分利用联邦学习设置:在训练过程中,服务器始终掌握先验数据分布$p_{\theta}$作为辅助信息,且该分布在库尔贝克-莱布勒散度上与客户端分布$q_{\phi^{(n)}}$接近。本文利用客户端分布$q_{\phi^{(n)}}$与服务器端辅助信息$p_{\theta}$之间的这种接近性,提出一种通信量约为$D_{KL}(q_{\phi^{(n)}}|| p_{\theta})$比特的框架。我们证明该方法可集成到联邦概率模型、联邦随机梯度朗之万动力学和量化随机梯度下降等现有随机压缩框架中,在降低高达50倍比特率的同时,保持相同(甚至更高)的测试精度。