In federated submodel learning (FSL), a machine learning model is divided into multiple submodels based on different types of data used for training. Each user involved in the training process only downloads and updates the submodel relevant to the user's local data, which significantly reduces the communication cost compared to classical federated learning (FL). However, the index of the submodel updated by the user and the values of the updates reveal information about the user's private data. In order to guarantee information-theoretic privacy in FSL, the model is stored at multiple non-colluding databases, and the user sends queries and updates to each database in such a way that no information is revealed on the updating submodel index or the values of the updates. In this work, we consider the practical scenario where the multiple non-colluding databases are allowed to have arbitrary storage constraints. The goal of this work is to develop read-write schemes and storage mechanisms for FSL that efficiently utilize the available storage in each database to store the submodel parameters in such a way that the total communication cost is minimized while guaranteeing information-theoretic privacy of the updating submodel index and the values of the updates. As the main result, we consider both heterogeneous and homogeneous storage constrained databases, and propose private read-write and storage schemes for the two cases.
翻译:在联邦子模型学习(FSL)中,机器学习模型根据训练所使用的不同数据类型被划分为多个子模型。每个参与训练的用户仅下载并更新与其本地数据相关的子模型,与经典联邦学习(FL)相比,这显著降低了通信成本。然而,用户更新的子模型索引及其更新值会泄露用户私有数据的信息。为了保证FSL中的信息论隐私保护,模型被存储在多个非共谋数据库中,用户向每个数据库发送查询和更新,使得更新子模型索引或更新值不泄露任何信息。本文考虑了多个非共谋数据库允许存在任意存储约束的实际场景。目标是开发适用于FSL的读写方案和存储机制,高效利用每个数据库的可用存储来存储子模型参数,在保证更新子模型索引和更新值的信息论隐私的前提下,最小化总通信成本。作为主要结果,我们分别考虑了异构和同构存储约束数据库,并针对这两种情况提出了私有读写和存储方案。