In federated learning (FL), the common paradigm that FedAvg proposes and most algorithms follow is that clients train local models with their private data, and the model parameters are shared for central aggregation, mostly averaging. In this paradigm, the communication cost is often a challenge, as modern massive neural networks can contain millions to billions parameters. We suggest that clients do not share model parameters but local data summaries, to decrease the cost of sharing. We develop a new algorithm FedLog with Bayesian inference, which shares only sufficient statistics of local data. FedLog transmits messages as small as the last layer of the original model. We conducted comprehensive experiments to show we outperform other FL algorithms that aim at decreasing the communication cost. To provide formal privacy guarantees, we further extend FedLog with differential privacy and show the trade-off between privacy budget and accuracy.
翻译:在联邦学习(FL)中,由FedAvg提出并被大多数算法遵循的常见范式是:客户端使用其私有数据训练本地模型,并共享模型参数以进行中心聚合(通常为平均)。在此范式中,通信成本常常是一个挑战,因为现代大规模神经网络可能包含数百万至数十亿个参数。我们建议客户端不共享模型参数,而是共享本地数据摘要,以降低共享成本。我们开发了一种基于贝叶斯推断的新算法FedLog,该算法仅共享本地数据的充分统计量。FedLog传输的消息大小可小至原始模型的最后一层。我们进行了全面的实验,结果表明本方法在降低通信成本方面优于其他联邦学习算法。为了提供形式化的隐私保证,我们进一步将FedLog与差分隐私相结合,并展示了隐私预算与准确性之间的权衡关系。