One main challenge in federated learning is the large communication cost of exchanging weight updates from clients to the server at each round. While prior work has made great progress in compressing the weight updates through gradient compression methods, we propose a radically different approach that does not update the weights at all. Instead, our method freezes the weights at their initial \emph{random} values and learns how to sparsify the random network for the best performance. To this end, the clients collaborate in training a \emph{stochastic} binary mask to find the optimal sparse random network within the original one. At the end of the training, the final model is a sparse network with random weights -- or a subnetwork inside the dense random network. We show improvements in accuracy, communication (less than $1$ bit per parameter (bpp)), convergence speed, and final model size (less than $1$ bpp) over relevant baselines on MNIST, EMNIST, CIFAR-10, and CIFAR-100 datasets, in the low bitrate regime under various system configurations.
翻译:联邦学习中的一个主要挑战是每轮客户端向服务器交换权重更新的高通信成本。尽管先前的工作通过梯度压缩方法在压缩权重更新方面取得了巨大进展,但我们提出了一种根本不同的方法,该方法完全不更新权重。相反,我们的方法将权重冻结在其初始的*随机*值,并学习如何稀疏化随机网络以获得最佳性能。为此,客户端协作训练一个*随机*二元掩码,以在原始网络中找到最优的稀疏随机网络。训练结束时,最终模型是一个具有随机权重的稀疏网络——或者说是在密集随机网络内的一个子网络。我们展示了在准确性、通信(每个参数小于$1$比特,bpp)、收敛速度和最终模型大小(小于$1$ bpp)方面的改进,在低比特率设置下,针对MNIST、EMNIST、CIFAR-10和CIFAR-100数据集,在各种系统配置下均优于相关基线。