Fully Robust Federated Submodel Learning in a Distributed Storage System

We consider the federated submodel learning (FSL) problem in a distributed storage system. In the FSL framework, the full learning model at the server side is divided into multiple submodels such that each selected client needs to download only the required submodel(s) and upload the corresponding update(s) in accordance with its local training data. The server comprises multiple independent databases and the full model is stored across these databases. An eavesdropper passively observes all the storage and listens to all the communicated data, of its controlled databases, to gain knowledge about the remote client data and the submodel information. In addition, a subset of databases may fail, negatively affecting the FSL process, as FSL process may take a non-negligible amount of time for large models. To resolve these two issues together (i.e., security and database repair), we propose a novel coding mechanism coined ramp secure regenerating coding (RSRC), to store the full model in a distributed manner. Using our new RSRC method, the eavesdropper is permitted to learn a controllable amount of submodel information for the sake of reducing the communication and storage costs. Further, during the database repair process, in the construction of the replacement database, the submodels to be updated are stored in the form of their latest version from updating clients, while the remaining submodels are obtained from the previous version in other databases through routing clients. Our new RSRC-based distributed FSL approach is constructed on top of our earlier two-database FSL scheme which uses private set union (PSU). A complete one-round FSL process consists of FSL-PSU phase, FSL-write phase and additional auxiliary phases. Our proposed FSL scheme is also robust against database drop-outs, client drop-outs, client late-arrivals and an active adversary controlling databases.

翻译：我们考虑分布式存储系统中的联邦子模型学习问题。在联邦子模型学习框架中，服务器端的完整学习模型被划分为多个子模型，使得每个被选中的客户端仅需根据其本地训练数据下载所需子模型并上传相应的更新。服务器由多个独立数据库组成，完整模型存储于这些数据库之间。被动窃听者会观测其控制下所有数据库的存储内容并监听所有通信数据，以获取关于远程客户端数据和子模型信息的知识。此外，部分数据库可能发生故障，这对联邦子模型学习过程产生负面影响，因为对于大型模型而言，联邦子模型学习过程可能需要不可忽略的时间。为同时解决这两个问题（即安全性和数据库修复），我们提出一种名为"斜坡安全再生编码"的新型编码机制，以分布式方式存储完整模型。使用我们提出的新斜坡安全再生编码方法，允许窃听者学习可控数量的子模型信息，以降低通信和存储成本。进一步地，在数据库修复过程中构建替代数据库时，待更新的子模型以来自更新客户端的最新版本形式存储，而其余子模型则通过路由客户端从其他数据库中的先前版本获取。我们基于斜坡安全再生编码的分布式联邦子模型学习方法构建于我们先前采用私有集合并的两数据库联邦子模型学习方案之上。完整的单轮联邦子模型学习过程包括联邦子模型学习-私有集合并阶段、联邦子模型学习-写入阶段以及额外的辅助阶段。我们提出的联邦子模型学习方案还能抵御数据库掉线、客户端掉线、客户端延迟到达以及控制数据库的主动攻击者。