This paper studies federated learning (FL)--especially cross-silo FL--with data from people who do not trust the server or other silos. In this setting, each silo (e.g. hospital) has data from different people (e.g. patients) and must maintain the privacy of each person's data (e.g. medical record), even if the server or other silos act as adversarial eavesdroppers. This requirement motivates the study of Inter-Silo Record-Level Differential Privacy (ISRL-DP), which requires silo i's communications to satisfy record/item-level differential privacy (DP). ISRL-DP ensures that the data of each person (e.g. patient) in silo i (e.g. hospital i) cannot be leaked. ISRL-DP is different from well-studied privacy notions. Central and user-level DP assume that people trust the server/other silos. On the other end of the spectrum, local DP assumes that people do not trust anyone at all (even their own silo). Sitting between central and local DP, ISRL-DP makes the realistic assumption (in cross-silo FL) that people trust their own silo, but not the server or other silos. In this work, we provide tight (up to logarithms) upper and lower bounds for ISRL-DP FL with convex/strongly convex loss functions and homogeneous (i.i.d.) silo data. Remarkably, we show that similar bounds are attainable for smooth losses with arbitrary heterogeneous silo data distributions, via an accelerated ISRL-DP algorithm. We also provide tight upper and lower bounds for ISRL-DP federated empirical risk minimization, and use acceleration to attain the optimal bounds in fewer rounds of communication than the state-of-the-art. Finally, with a secure "shuffler" to anonymize silo messages (but without a trusted server), our algorithm attains the optimal central DP rates under more practical trust assumptions. Numerical experiments show favorable privacy-accuracy tradeoffs for our algorithm in classification and regression tasks.
翻译:本文研究联邦学习(FL)——特别是跨孤岛FL——涉及不信任服务器或其他孤岛的用户数据场景。在此设定下,每个孤岛(如医院)拥有不同个体(如患者)的数据,且必须保护每个个体数据(如病历)的隐私,即使服务器或其他孤岛充当对抗性窃听者。这一需求促使我们研究跨孤岛记录级差分隐私(ISRL-DP),该机制要求孤岛i的通信满足记录/条目级差分隐私(DP)。ISRL-DP确保孤岛i(如医院i)中每个个体(如患者)的数据不被泄露。ISRL-DP不同于已被充分研究的隐私概念:中央级和用户级DP假设用户信任服务器或其他孤岛;而本地DP则假设用户完全不信任任何人(甚至包括自身所属孤岛)。作为中央DP与本地DP的中间方案,ISRL-DP在跨孤岛FL中做出符合现实的假设:用户信任自身孤岛,但不信任服务器或其他孤岛。本文针对凸/强凸损失函数及同构(独立同分布)孤岛数据,给出了ISRL-DP联邦学习的紧致(对数意义上)上下界。值得注意的是,我们证明通过加速的ISRL-DP算法,对于具有任意异构孤岛数据分布的平滑损失,可达到类似界值。我们还为ISRL-DP联邦经验风险最小化提供了紧致的上下界,并通过加速算法在更少通信轮次内达到最优界。最后,借助安全"洗牌器"对孤岛消息进行匿名化(但无需可信服务器),我们的算法在更实际的信任假设下实现了最优中央DP速率。数值实验表明,在分类与回归任务中,该算法展现出优越的隐私-精度权衡性能。