We revisit the problem of federated learning (FL) with private data from people who do not trust the server or other silos/clients. In this context, every silo (e.g. hospital) has data from several people (e.g. patients) and needs to protect the privacy of each person's data (e.g. health records), even if the server and/or other silos try to uncover this data. Inter-Silo Record-Level Differential Privacy (ISRL-DP) prevents each silo's data from being leaked, by requiring that silo i's communications satisfy item-level differential privacy. Prior work arXiv:2203.06735 characterized the optimal excess risk bounds for ISRL-DP algorithms with homogeneous (i.i.d.) silo data and convex loss functions. However, two important questions were left open: (1) Can the same excess risk bounds be achieved with heterogeneous (non-i.i.d.) silo data? (2) Can the optimal risk bounds be achieved with fewer communication rounds? In this paper, we give positive answers to both questions. We provide novel ISRL-DP FL algorithms that achieve the optimal excess risk bounds in the presence of heterogeneous silo data. Moreover, our algorithms are more communication-efficient than the prior state-of-the-art. For smooth loss functions, our algorithm achieves the optimal excess risk bound and has communication complexity that matches the non-private lower bound. Additionally, our algorithms are more computationally efficient than the previous state-of-the-art.
翻译:我们重新审视了联邦学习(FL)中涉及不信任服务器或其他数据孤岛/客户端用户私有数据的问题。在此背景下,每个数据孤岛(例如医院)都拥有来自多个个体(例如患者)的数据,并且需要保护每个个体数据(例如健康记录)的隐私,即使服务器和/或其他数据孤岛试图揭露这些数据。孤岛间记录级差分隐私(ISRL-DP)通过要求孤岛i的通信满足项目级差分隐私,防止每个孤岛的数据泄露。先前的工作arXiv:2203.06735刻画了具有同构(独立同分布)孤岛数据和凸损失函数的ISRL-DP算法的最优超额风险界。然而,两个重要问题尚未解决:(1)在异构(非独立同分布)孤岛数据下能否达到相同的超额风险界?(2)能否以更少的通信轮数达到最优风险界?本文对这两个问题给出了肯定回答。我们提出了新颖的ISRL-DP联邦学习算法,在存在异构孤岛数据的情况下实现了最优超额风险界。此外,我们的算法比现有最优方法具有更高的通信效率。对于光滑损失函数,我们的算法达到了最优超额风险界,且通信复杂度与非隐私下界相匹配。同时,我们的算法在计算效率上也优于先前的最优方法。