In federated learning (FL), a machine learning (ML) model is collectively trained by a large number of users, using their private data in their local devices. With top $r$ sparsification in FL, the users only upload the most significant $r$ fraction of updates, and the servers only send the most significant $r'$ fraction of parameters to the users in order to reduce the communication cost. However, the values and the indices of the sparse updates leak information about the users' private data. In this work, we consider an FL setting where $N$ non-colluding databases store the model to be trained, from which the users download and update sparse parameters privately, without revealing the values of the updates or their indices to the databases. We propose four schemes with different properties to perform this task while achieving the minimum communication costs, and show that the information theoretic privacy of both values and positions of the sparse updates can be guaranteed. This is achieved at a considerable storage cost, though. To alleviate this, we generalize the schemes in such a way that the storage cost is reduced at the expense of a certain amount of information leakage, using a model segmentation mechanism. In general, we provide the tradeoff between communication cost, storage cost and information leakage in private FL with top $r$ sparsification.
翻译:在联邦学习(FL)中,大量用户利用本地设备上的私有数据共同训练机器学习(ML)模型。采用顶级$r$稀疏化技术的联邦学习中,用户仅上传最具信息量的前$r$比例更新,服务器也仅向用户发送最具信息量的前$r'$比例参数,以此降低通信开销。然而,稀疏更新的数值及其索引会泄露用户的私有数据信息。本研究考虑一种联邦学习场景:由$N个非共谋数据库存储待训练模型,用户从中下载并私有更新稀疏参数,且不向数据库泄露更新数值或索引信息。我们提出四种具有不同特性的方案来实现该目标,并使其达到最低通信成本。理论证明,稀疏更新的数值与位置在信息论意义上均可实现隐私保护。但该方案会带来显著的存储成本。为缓解此问题,我们通过模型分割机制对方案进行泛化,以牺牲部分信息泄漏为代价降低存储成本。总体而言,本文揭示了采用顶级$r$稀疏化的私有联邦学习中,通信成本、存储成本与信息泄漏之间的权衡关系。