We study federated learning (FL) -- especially cross-silo FL -- with non-convex loss functions and data from people who do not trust the server or other silos. In this setting, each silo (e.g. hospital) must protect the privacy of each person's data (e.g. patient's medical record), even if the server or other silos act as adversarial eavesdroppers. To that end, we consider inter-silo record-level (ISRL) differential privacy (DP), which requires silo~$i$'s communications to satisfy record/item-level DP. We propose novel ISRL-DP algorithms for FL with heterogeneous (non-i.i.d.) silo data and two classes of Lipschitz continuous loss functions: First, we consider losses satisfying the Proximal Polyak-Lojasiewicz (PL) inequality, which is an extension of the classical PL condition to the constrained setting. In contrast to our result, prior works only considered unconstrained private optimization with Lipschitz PL loss, which rules out most interesting PL losses such as strongly convex problems and linear/logistic regression. Our algorithms nearly attain the optimal strongly convex, homogeneous (i.i.d.) rate for ISRL-DP FL without assuming convexity or i.i.d. data. Second, we give the first private algorithms for non-convex non-smooth loss functions. Our utility bounds even improve on the state-of-the-art bounds for smooth losses. We complement our upper bounds with lower bounds. Additionally, we provide shuffle DP (SDP) algorithms that improve over the state-of-the-art central DP algorithms under more practical trust assumptions. Numerical experiments show that our algorithm has better accuracy than baselines for most privacy levels. All the codes are publicly available at: https://github.com/ghafeleb/Private-NonConvex-Federated-Learning-Without-a-Trusted-Server.
翻译:我们研究了采用非凸损失函数且数据来源于不信任服务器或其他参与方的联邦学习(FL)——特别是跨孤岛联邦学习。在该场景中,每个数据孤岛(如医院)必须保护个体数据(如患者病历)的隐私,即使服务器或其他孤岛充当对抗性窃听者。为此,我们考虑跨孤岛记录级差分隐私(ISRL-DP),要求孤岛~$i$的通信满足记录/条目级差分隐私。针对异构(非独立同分布)孤岛数据和两类Lipschitz连续损失函数,我们提出了新型ISRL-DP算法:首先,考虑满足近端Polyak-Lojasiewicz(PL)不等式的损失函数,这是经典PL条件在约束场景下的推广。区别于现有研究仅考虑Lipschitz PL损失的无约束隐私优化(排除了强凸问题、线性/逻辑回归等大多数有意义的PL损失),我们的算法在不假设凸性或独立同分布数据的情况下,几乎达到了强凸同构(独立同分布)场景下ISRL-DP FL的最优收敛速度。其次,我们首次提出了适用于非凸非光滑损失函数的隐私算法,其效用界限甚至优于当前光滑损失的最优结果。我们通过理论下界补充了上界分析。此外,在更实际的信任假设下,我们提出的洗牌差分隐私(SDP)算法改进了现有中心化差分隐私算法。数值实验表明,在大多数隐私保护级别下,我们的算法具有比基线方法更高的准确率。所有代码已开源:https://github.com/ghafeleb/Private-NonConvex-Federated-Learning-Without-a-Trusted-Server。