Secure aggregation, which is a core component of federated learning, aggregates locally trained models from distributed users at a central server. The ``secure'' nature of such aggregation consists of the fact that no information about the local users' data must be leaked to the server except the aggregated local models. In order to guarantee security, some keys may be shared among the users. After the key sharing phase, each user masks its trained model which is then sent to the server. This paper follows the information theoretic secure aggregation problem originally formulated by Zhao and Sun, with the objective to characterize the minimum communication cost from the K users in the model aggregation phase. Due to user dropouts, the server may not receive all messages from the users. A secure aggregation scheme should tolerate the dropouts of at most $K-U$ users. The optimal communication cost is characterized by Zhao and Sun, but with the assumption that the keys stored by the users could be any random variables with arbitrary dependency. On the motivation that uncoded groupwise keys are more convenient to be shared and could be used in large range of applications besides federated learning, in this paper we assume the key variables are mutually independent and each key is shared by a group of S users. To the best of our knowledge, all existing secure aggregation schemes assign coded keys to the users. We show that if $S> K-U$, a new secure aggregation scheme with uncoded groupwise keys can achieve the same optimal communication cost as the best scheme with coded keys; if $S \leq K-U$, uncoded groupwise key sharing is strictly sub-optimal. Finally, we also implement our proposed secure aggregation scheme into Amazon EC2, which are then compared with the existing secure aggregation schemes with offline key sharing.
翻译:安全聚合作为联邦学习的核心组件,在中央服务器处聚合来自分布式用户的本地训练模型。此类聚合的"安全性"体现在:除聚合后的本地模型外,服务器不得泄露任何关于本地用户数据的信息。为确保安全性,用户之间可共享部分密钥。密钥共享阶段结束后,每个用户对其训练模型进行掩码处理,再发送至服务器。本文遵循Zhao和Sun最初提出的信息论安全聚合问题框架,旨在刻画模型聚合阶段中K个用户的最小通信成本。由于用户掉线,服务器可能无法接收到所有用户的消息。安全聚合方案应能容忍最多$K-U$个用户的掉线情况。Zhao和Sun已给出最优通信成本的特征刻画,但其假设用户存储的密钥可以是任意随机变量且具有任意依赖关系。鉴于无编码分组密钥更便于共享,且可应用于联邦学习之外的广泛场景,本文假设密钥变量相互独立,每个密钥由S个用户组成的群组共享。据我们所知,现有安全聚合方案均采用编码密钥分配给用户。我们证明:若$S> K-U$,采用无编码分组密钥的新型安全聚合方案可实现与最优编码密钥方案相同的通信成本;若$S \leq K-U$,无编码分组密钥共享则严格次优。最后,我们在Amazon EC2上实现了所提出的安全聚合方案,并与现有采用离线密钥共享的安全聚合方案进行了对比实验。