This paper studies federated learning (FL)--especially cross-silo FL--with data from people who do not trust the server or other silos. In this setting, each silo (e.g. hospital) has data from different people (e.g. patients) and must maintain the privacy of each person's data (e.g. medical record), even if the server or other silos act as adversarial eavesdroppers. This requirement motivates the study of Inter-Silo Record-Level Differential Privacy (ISRL-DP), which requires silo i's communications to satisfy record/item-level differential privacy (DP). ISRL-DP ensures that the data of each person (e.g. patient) in silo i (e.g. hospital i) cannot be leaked. ISRL-DP is different from well-studied privacy notions. Central and user-level DP assume that people trust the server/other silos. On the other end of the spectrum, local DP assumes that people do not trust anyone at all (even their own silo). Sitting between central and local DP, ISRL-DP makes the realistic assumption (in cross-silo FL) that people trust their own silo, but not the server or other silos. In this work, we provide tight (up to logarithms) upper and lower bounds for ISRL-DP FL with convex/strongly convex loss functions and homogeneous (i.i.d.) silo data. Remarkably, we show that similar bounds are attainable for smooth losses with arbitrary heterogeneous silo data distributions, via an accelerated ISRL-DP algorithm. We also provide tight upper and lower bounds for ISRL-DP federated empirical risk minimization, and use acceleration to attain the optimal bounds in fewer rounds of communication than the state-of-the-art. Finally, with a secure "shuffler" to anonymize silo messages (but without a trusted server), our algorithm attains the optimal central DP rates under more practical trust assumptions. Numerical experiments show favorable privacy-accuracy tradeoffs for our algorithm in classification and regression tasks.
翻译:本文研究联邦学习(FL)——特别是跨孤岛FL——涉及不信任服务器或其他孤岛的用户数据。在此场景下,每个孤岛(如医院)拥有不同用户(如患者)的数据,且必须维护每位用户数据(如医疗记录)的隐私,即使服务器或其他孤岛充当对抗性窃听者。这一需求催生了跨孤岛记录级差分隐私(ISRL-DP)的研究,它要求孤岛i的通信满足记录/项目级差分隐私(DP)。ISRL-DP确保孤岛i中每位用户(如患者)的数据不被泄露。ISRL-DP不同于已有隐私概念:中心级和用户级DP假设用户信任服务器/其他孤岛;而本地DP假设用户完全不信任任何人(甚至包括自身所属孤岛)。ISRL-DP介于中心级与本地DP之间,对跨孤岛FL做出合理假设:用户信任自身孤岛,但不信任服务器或其他孤岛。本文针对凸/强凸损失函数及同质(i.i.d.)孤岛数据的ISRL-DP联邦学习,给出了紧致(对数级别)的上下界。值得注意的是,我们证明通过加速ISRL-DP算法,对于任意异质孤岛数据分布的平滑损失,可获得相似界。此外,我们给出ISRL-DP联邦经验风险最小化的紧致上下界,并利用加速技术以比现有方法更少的通信轮次达到最优界。最后,借助安全“混洗器”实现孤岛消息匿名化(但无需信任服务器),我们的算法在更实用的信任假设下达到了中心级DP最优速率。数值实验表明,在分类与回归任务中,该算法实现了优越的隐私-精度权衡。