This paper considers the secure aggregation problem for federated learning under an information theoretic cryptographic formulation, where distributed training nodes (referred to as users) train models based on their own local data and a curious-but-honest server aggregates the trained models without retrieving other information about users' local data. Secure aggregation generally contains two phases, namely key sharing phase and model aggregation phase. Due to the common effect of user dropouts in federated learning, the model aggregation phase should contain two rounds, where in the first round the users transmit masked models and, in the second round, according to the identity of surviving users after the first round, these surviving users transmit some further messages to help the server decrypt the sum of users' trained models. The objective of the considered information theoretic formulation is to characterize the capacity region of the communication rates in the two rounds from the users to the server in the model aggregation phase, assuming that key sharing has already been performed offline in prior. In this context, Zhao and Sun completely characterized the capacity region under the assumption that the keys can be arbitrary random variables. More recently, an additional constraint, known as "uncoded groupwise keys," has been introduced. This constraint entails the presence of multiple independent keys within the system, with each key being shared by precisely S users. The capacity region for the information-theoretic secure aggregation problem with uncoded groupwise keys was established in our recent work subject to the condition S > K - U, where K is the number of total users and U is the designed minimum number of surviving users. In this paper we fully characterize of the the capacity region for this problem by proposing a new converse bound and an achievable scheme.
翻译:本文在信息论密码学框架下研究联邦学习中的安全聚合问题,其中分布式训练节点(称为用户)基于本地数据训练模型,而好奇但诚实的服务器聚合训练后的模型,同时不获取关于用户本地数据的其他信息。安全聚合通常包含两个阶段:密钥共享阶段和模型聚合阶段。由于联邦学习中用户掉线的普遍影响,模型聚合阶段应包含两轮:第一轮中用户传输掩码模型,第二轮中根据第一轮后存活用户的身份,这些存活用户传输额外消息以帮助服务器解密用户训练模型的总和。所考虑信息论问题的目标是刻画模型聚合阶段两轮中从用户到服务器的通信速率的容量区域,假设密钥已在离线阶段预先共享。在此背景下,Zhao和Sun在假定密钥可以是任意随机变量的条件下完整刻画了该容量区域。近期学界引入了称为"未编码分组密钥"的附加约束,该约束要求系统中存在多个独立密钥,每个密钥恰好由S个用户共享。我们之前的工作在S > K - U条件下建立了此类问题(其中K为用户总数,U为设计的最小存活用户数)的容量区域。本文通过提出新的对偶界与可实现方案,完整刻画了该问题的容量区域。